Date of Award

Fall 1994

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer and Information Science

First Advisor

Peter A. Ng

Second Advisor

Jason T. L. Wang

Third Advisor

James A. McHugh

Fourth Advisor

Murray Turoff

Fifth Advisor

Raymond Tzuu-Yau Yeh

Abstract

TEXPROS (TEXt PROcessing System) is an intelligent document processing system. The system is a combination of filing and retrieval systems, which supports storing, classifying, categorizing, retrieving and reproducing documents, as well as extracting, browsing, retrieving and synthesizing information from a variety of documents. This dissertation presents a retrieval system for TEXPROS, which is capable of processing incomplete or vague queries and providing semantically meaningful responses to the users. The design of the retrieval system is highly integrated with various mechanisms for achieving these goals. First, a system catalog including a thesaurus is used to store the knowledge about the database. Secondly, there is a query transformation mechanism which consists of context construction and algebraic query formulation modules. Given an incomplete query, the context construction module searches the system for the required terms and constructs a query that has a complete representation. The resulting query is then formulated into an algebraic query. Thirdly, in practice, the user may not have a precise notion of what he is looking for. A browsing mechanism is employed for such situations to assist the user in the retrieval process. With the browser, vague queries can be entered into the system until sufficient information is obtained to the extent that the user is able to construct a query for his request. Finally, when processing of queries responds with an empty answer to the user, a query generalization mechanism is used to give the user a cooperative explanation for the empty answer. The generalizations of any given failed queries (i.e., with an empty answer) are derived by applying both the folder and type substitutions and weakening the search criteria in the original query. An efficient way is investigated for determining whether the empty answer is genuine and whether the original query reflects erroneous presuppositions, and therefore answering any failed query with a meaningful and cooperative response. It incorporates with a methodical approach to reducing the search space of generalized subqueries by analyzing the results of executing the query generalization and by efficiently applying the possible substitutions in a query to generate a small subset of relevant subqueries which are to be evaluated.

Share

COinS