The Influence of Hubness on NN-Descent
Document Type
Article
Publication Date
9-1-2019
Abstract
The K-nearest neighbor graph (K-NNG) is a data structure used by many machine-learning algorithms. Naive computation of the K-NNG has quadratic time complexity, which in many cases is not efficient enough, producing the need for fast and accurate approximation algorithms. NN-Descent is one such algorithm that is highly efficient, but has a major drawback in that K-NNG approximations are accurate only on data of low intrinsic dimensionality. This paper represents an experimental analysis of this behavior, and investigates possible solutions. Experimental results show that there is a link between the performance of NN-Descent and the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs-points with large in-degrees in the K-NNG. First, we explain how the presence of the hubness phenomenon causes bad NN-Descent performance. In light of that, we propose four NN-Descent variants to alleviate the observed negative inuence of hubs. By evaluating the proposed approaches on several real and synthetic data sets, we conclude that our approaches are more accurate, but often at the cost of higher scan rates.
Identifier
85072953049 (Scopus)
Publication Title
International Journal on Artificial Intelligence Tools
External Full Text Location
https://doi.org/10.1142/S0218213019600029
e-ISSN
17936349
ISSN
02182130
Issue
6
Volume
28
Grant
DGE 1565478
Fund Ref
National Science Foundation
Recommended Citation
Bratić, Brankica; Houle, Michael E.; Kurbalija, Vladimir; Oria, Vincent; and Radovanović, Miloš, "The Influence of Hubness on NN-Descent" (2019). Faculty Publications. 7368.
https://digitalcommons.njit.edu/fac_pubs/7368
