A document classification and extraction system with learning ability

Document Type

Conference Proceeding

Publication Date

1-1-1999

Abstract

Document image processing begins at the OCR phase with the difficulty of automatic document analysis and understanding. Most existing systems only do well in their specific application domains. In this paper, we describe a domain-independent automatic document image understanding system with learning ability. A segmentation method based on "logical closeness" is proposed. A novel and natural representation of document layout structure-A directed weight graph (DWG)-is described. To classify a given document, a string representation matching algorithm is applied first, instead of comparing all the sample graphs. A frame template and a document type hierarchy (DTH) are used to represent the document's logical structure and the hierarchical relationships among these frame templates, respectively. In this paper, two learning methodologies are applied-learning from experience and an enhanced perceptron learning algorithm.

Identifier

18744380482 (Scopus)

ISBN

[0769503187]

Publication Title

Proceedings of the International Conference on Document Analysis and Recognition ICDAR

External Full Text Location

https://doi.org/10.1109/ICDAR.1999.791758

ISSN

15205363

First Page

197

Last Page

200

This document is currently not available here.

Share

COinS