Tool for classifying office documents
Document Type
Conference Proceeding
Publication Date
12-1-1993
Abstract
This paper presents the design of a tool for classifying office documents. We represent a document's layout structure using an ordered labeled tree, called the 'layout structure tree' (L-S-Tree), based on a nested segmentation procedure. The tool uses a sample-based approach for learning where concepts are learned by retaining samples and new documents are classified by matching their L-S-Trees with samples. The matching process involves both computing the edit distance between two trees using a previously developed pattern matching toolkit, and calculating the degree of conceptual closeness between the documents and samples. Our experimental results show that the tool is capable of classifying various types of office documents, even with very few samples in the sample base.
Identifier
0027810377 (Scopus)
ISBN
[0818642009]
Publication Title
Proceedings of the International Conference on Tools with Artificial Intelligence
ISSN
10636730
First Page
427
Last Page
434
Recommended Citation
Hao, Xiaolong; Wang, Jason T.; Bieber, Michael P.; and Ng, Peter A., "Tool for classifying office documents" (1993). Faculty Publications. 16976.
https://digitalcommons.njit.edu/fac_pubs/16976
