The stepwise dimensionality increasing (SDI) index for high-dimensional data

Document Type

Article

Publication Date

9-1-2006

Abstract

Similarity search is a powerful paradigm for image and multimedia databases, time series databases, and DNA and protein sequence databases. Objects are represented by high-dimensional feature vectors based on color, texture, and shape, in the case of images, for example object similarity is usually implemented via k-nearest neighbor (k-NN) queries, determined by the distance of the endpoints of the feature vectors. The cost of processing k-NN queries via a sequential scan increases with the number of objects and the number of dimensions. Multi-dimensional indexing structures can be used to improve the efficiency of k-NN query processing, but lose their effectiveness as the dimensionality increases. The curse of dimensionality manifests itself in the form of increased overlap among the nodes of the index, so that a high fraction of index pages are touched in processing k-NN queries. The increased dimensionality results in a reduced fanout and an increased index height. We propose a stepwise dimensionality increasing (SDI)-tree index, which aims at reducing the number of disk accesses and CPU processing cost. The index is built using feature vectors transformed via principal component analysis. Dimensions are retained in non-increasing order of their variance according to a parameter p, which specifies the incremental fraction of variance at each level of the index. The optimal value for p is determined experimentally. Experiments on three datasets have shown that SDI-trees access fewer disk pages and incur less CPU time than SR-trees, VAMSR-trees, vector approximation (VA)-Files and the recently proposed iDistance method. In CPU time SDI outperforms the sequential scan and OMNI methods. © 2006 Oxford University Press.

Identifier

33750133782 (Scopus)

Publication Title

Computer Journal

External Full Text Location

https://doi.org/10.1093/comjnl/bxl022

e-ISSN

14602067

ISSN

00104620

First Page

609

Last Page

618

Issue

5

Volume

49

This document is currently not available here.

Share

COinS