A document classification and extraction system with learning ability
Document Type
Conference Proceeding
Publication Date
1-1-1999
Abstract
Document image processing begins at the OCR phase with the difficulty of automatic document analysis and understanding. Most existing systems only do well in their specific application domains. In this paper, we describe a domain-independent automatic document image understanding system with learning ability. A segmentation method based on "logical closeness" is proposed. A novel and natural representation of document layout structure-A directed weight graph (DWG)-is described. To classify a given document, a string representation matching algorithm is applied first, instead of comparing all the sample graphs. A frame template and a document type hierarchy (DTH) are used to represent the document's logical structure and the hierarchical relationships among these frame templates, respectively. In this paper, two learning methodologies are applied-learning from experience and an enhanced perceptron learning algorithm.
Identifier
18744380482 (Scopus)
ISBN
[0769503187]
Publication Title
Proceedings of the International Conference on Document Analysis and Recognition ICDAR
External Full Text Location
https://doi.org/10.1109/ICDAR.1999.791758
ISSN
15205363
First Page
197
Last Page
200
Recommended Citation
Li, Xuhong and Ng, Peter A., "A document classification and extraction system with learning ability" (1999). Faculty Publications. 16140.
https://digitalcommons.njit.edu/fac_pubs/16140
