Date of Award

Spring 1993

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer and Information Science

First Advisor

Peter A. Ng

Second Advisor

Jason T. L. Wang

Third Advisor

James A. McHugh

Fourth Advisor

Murray Turoff

Abstract

This dissertation presents a data model (called D_model) and an algebra (called D_ algebra) for office documents. The data model adopts a very natural view of modeling office documents. Documents are grouped into classes; each class is characterized by a "frame template", which describes the properties (or attributes) for the class of documents. A frame template is instantiated by providing it with values to form a "frame instance" which becomes the synopsis of the document of the class associated with the frame template. Different frame instances can be grouped into a folder. Therefore, a folder is a set of frame instances which need not be over the same frame template.

The D_model is a dual model which describes documents using two hierarchies: a document type hierarchy which depicts the structural organization of the documents and a folder organization, which represents the user's real-world document filing system. The document type hierarchy exploits structural commonalities between frame templates. Such a hierarchy helps classify various documents. The folder organization mimics the user's real-world document filing system and provides the user with an intuitively clear view of the filing system. This facilitates document retrieval activities.

The D_algebra includes a family of operators which together comprise the fundamental query language for the D_model. The algebra provides operators that can be applied to folders which contain frame instances of different types. It has more expressive power than the relational algebra. It extends the classical relational algebra by associating attributes with types, and supporting attribute inheritance. Aggregate operators which can be applied to different frame instances in a folder are also provided. The proposed algebra is used as a sound basis to express the semantics of a high level query language for a document processing system, called TEXPROS.

In the model, frame instances can represent incomplete information. Null values of the form value at present unknown are used to denote missing information in some fields of the incomplete frame instances. This dissertation provides a proof-theoretic characterization of the data model and defines the semantics of the null values within the proof-theoretic paradigm.

Share

COinS