Date of Award
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Computer and Information Science
James A. McHugh
Peter A. Ng
Jason T. L. Wang
Ronald S. Curtis
This dissertation presents a formal approach to modeling documents in a personal office environment, proposes a heterogeneous algebraic query language to manipulating objects (folders) in the document model, and investigates a predicate-driven document filing system for automatically filing documents.
The document model was initially proposed in  which adopts a very natural view for describing the office documents using the relational and object-oriented paradigms. The model employs a dual approach to classifying and categorizing office documents by defining both a document type hierarchy and a folder organization. This dissertation extends and specifies formally the document model. Documents are partitioned into different classes, each document class being represented by frame template which describes the properties of the documents of the class. A particular office document, summarized from the view point of its frame template, yields a synopsis of the document which is called frame instances. Frame instances are grouped into a folder on the basis of user-defined criteria, specified as predicates, which determine whether a frame instance belongs to a folder. Folders, each of which is a heterogeneous set of frame instances, can be naturally organized into a folder organization. The folder organization specifying the document filing view is then defined using predicates and a directed graph. However, some operators in the algebraic query language  do not support the heterogeneous property. This dissertation proposes an algebra-based query language that gives full support to this heterogeneous property.
We investigate the construction problem of a folder organization: does it allow a user to add a new folder with an arbitrary local predicate? Given a folder organization, creating a new folder with arbitrarily defined predicate may cause two abnormalities: inapplicable edges (filing paths) and redundant folders. To deal such abnormalities in the process of constructing a folder organization, the concept of predicate consistency is discussed and an algorithm is proposed for determining whether the predicate of a new folder is consistent with the existing folder organization.
The global predicate of a folder governs the content of the folder. However, the predicates of folders (that is, global predicates) do not uniquely specify a folder organization. Then, we investigate the reconstruction problem: under what circumstance can we uniquely recover the folder organization from its global predicates? The problem is solved in terms of graph-theoretic concepts such as associated digraphs, transitive closure, and redundant/non-redundant filing paths. A transitive closure inversion algorithm is then presented which efficiently recovers a folder organization digraph from its associated digraph.
After defining a folder organization, we can file a frame instance into the folder organization. A document filing algorithm describes the procedure of filing a frame instance. However, the critical issue of the algorithm is how to evaluate whether a frame instance satisfies the predicate of a folder in a folder organization. In order to solve this issue, a thesaurus, an association dictionary and a knowledge base are then introduced. The thesaurus specifies the association relationship among the key terms that are actually residing in the system and terms that are used by users. An association dictionary gives the association relationship between an attribute of a predicate and a frame template defined in a folder organization. A knowledge base represents background knowledge in a certain application domain.
Zhu, Zhijian, "On document filing based upon predicates" (1997). Dissertations. 1070.