A generic framework for efficient computation of top-k diverse results
Document Type
Article
Publication Date
7-1-2023
Abstract
Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of “aggregate” diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a 24 × speedup.
Identifier
85142881566 (Scopus)
Publication Title
VLDB Journal
External Full Text Location
https://doi.org/10.1007/s00778-022-00770-0
e-ISSN
0949877X
ISSN
10668888
First Page
737
Last Page
761
Issue
4
Volume
32
Grant
1814595
Fund Ref
Center for Selective C-H Functionalization, National Science Foundation
Recommended Citation
Islam, Md Mouinul; Asadi, Mahsa; Amer-Yahia, Sihem; and Roy, Senjuti Basu, "A generic framework for efficient computation of top-k diverse results" (2023). Faculty Publications. 1595.
https://digitalcommons.njit.edu/fac_pubs/1595