Proportionate Diversification of Top-k LLM Results using Database Queries
Document Type
Conference Proceeding
Publication Date
1-1-2023
Abstract
Result diversification aims to return relevant results that cover a variety of perspectives. Attribute-based diversification groups results by shared attributes (e.g., genre for movies) and selects a proportional number of items from each group based on their distribution in the underlying data. However, large language models (LLMs) are not designed to produce proportionally diverse results. In this work, we propose leveraging external data sources to determine the distribution of groups related to a query and prompt LLMs to produce proportionally diverse results. This can improve result diversity by representing groups in proportion to their prevalence. Specifically, we first argue the benefits of making top-k results from LLMs proportionally diverse. We then show how to use external benchmark databases to enable proportional diversity. Finally, we outline a framework that prompts LLMs with proportionality information from external data and discuss challenges in automating this process. Our approach provides a path to overcoming LLMs' limitations in producing proportionally diverse responses.
Identifier
85171260694 (Scopus)
Publication Title
Ceur Workshop Proceedings
ISSN
16130073
Volume
3462
Grant
1814595
Fund Ref
National Science Foundation
Recommended Citation
On, Thinh; Ghosh, Subhodeep; Du, Mengnan; and Roy, Senjuti Basu, "Proportionate Diversification of Top-k LLM Results using Database Queries" (2023). Faculty Publications. 2041.
https://digitalcommons.njit.edu/fac_pubs/2041