Persistent clustered main memory index for accelerating k-NN queries on high dimensional datasets
Document Type
Article
Publication Date
6-1-2008
Abstract
Similarity search implemented via k-nearest neighbor- k-NN queries on multidimensional indices is an extremely useful paradigm for content-based image retrieval. As the dimensionality of feature vectors increases the curse of dimensionality sets in, i.e., the performance of k-NN search of disk-resident indices in the R-tree family degrades rapidly due to the overlap in index pages in high dimensions. This problem is dealt with in this study by utilizing the double filtering effect of clustering and indexing. The clustering algorithm ensures that the largest cluster fits into main memory and that only clusters closest to a query point need to be searched and hence loaded into main memory. We organize the data in each cluster according to the ordered-partition-OP-tree main memory resident index, which is not prone to the curse of dimensionality and highly efficient for processing k-NN queries. We serialize an OP-tree by writing its dynamically allocated nodes into contiguous memory locations, optimize its parameters, and make it persistent by writing it to disk. The time to read and write clusters constituting an OP-tree with a single sequential access to disk benefits from higher data transfer rates of modern disk drives. The performance of the index is further improved by applying the Karhunen-Loève transformation-KLT to the dataset, since this results in a more efficient computation of distances for k-NN queries. We compare OP-trees and sequential scans with and without a KL-transformation and with and without using a shortcut method in calculating Euclidean distances. A comparison against the OMNI-sequential scan is also reported. We finally compare a clustered and persistent version of the OP-tree against a clustered version of the SR-tree and the VA-file method. CPU time is measured and elapsed time is estimated in this study. It is observed that the OP-tree index outperforms the other two methods and that the improvement increases with the number of dimensions. © 2007 Springer Science+Business Media, LLC.
Identifier
43149101633 (Scopus)
Publication Title
Multimedia Tools and Applications
External Full Text Location
https://doi.org/10.1007/s11042-007-0179-7
e-ISSN
15737721
ISSN
13807501
First Page
253
Last Page
270
Issue
2
Volume
38
Grant
0105485
Fund Ref
National Science Foundation
Recommended Citation
Thomasian, Alexander and Zhang, Lijuan, "Persistent clustered main memory index for accelerating k-NN queries on high dimensional datasets" (2008). Faculty Publications. 12787.
https://digitalcommons.njit.edu/fac_pubs/12787
