Dissertations

Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm

Razvan Stefan Bot, New Jersey Institute of Technology

Document Type

Dissertation

Date of Award

Spring 5-31-2005

Degree Name

Doctor of Philosophy in Information Systems - (Ph.D.)

Department

Information Systems

First Advisor

Yi-Fang Brook Wu

Second Advisor

Murray Turoff

Third Advisor

Vincent Oria

Fourth Advisor

Nicholas J. Belkin

Fifth Advisor

Bartel Albrecht Van de Walle

Abstract

Document representation (indexing) techniques are dominated by variants of the term-frequency analysis approach, based on the assumption that the more occurrences a term has throughout a document the more important the term is in that document. Inherent drawbacks associated with this approach include: poor index quality, high document representation size and the word mismatch problem. To tackle these drawbacks, a document representation improvement method called the Relevance Feedback Accumulation (RFA) algorithm is presented. The algorithm provides a mechanism to continuously accumulate relevance assessments over time and across users. It also provides a document representation modification function, or document representation learning function that gradually improves the quality of the document representations. To improve document representations, the learning function uses a data mining measure called "support" for analyzing the accumulated relevance feedback.

Evaluation is done by comparing the RFA algorithm to other four algorithms. The four measures used for evaluation are (a) average number of index terms per document; (b) the quality of the document representations assessed by human judges; (c) retrieval effectiveness; and (d) the quality of the document representation learning function. The evaluation results show that (1) the algorithm is able to substantially reduce the document representations size while maintaining retrieval effectiveness parameters; (2) the algorithm provides a smooth and steady document representation learning function; and (3) the algorithm improves the quality of the document representations. The RFA algorithm's approach is consistent with efficiency considerations that hold in real information retrieval systems.

The major contribution made by this research is the design and implementation of a novel, simple, efficient, and scalable technique for document representation improvement.

Recommended Citation

Bot, Razvan Stefan, "Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm" (2005). Dissertations. 727.
https://digitalcommons.njit.edu/dissertations/727

Download

Included in

Databases and Information Systems Commons, Management Information Systems Commons

COinS

Dissertations

Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations

Improving document representation by accumulating relevance feedback : the relevance feedback accumulation (RFA) algorithm

Author

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links