Tool for classifying office documents

Document Type

Conference Proceeding

Publication Date

12-1-1993

Abstract

This paper presents the design of a tool for classifying office documents. We represent a document's layout structure using an ordered labeled tree, called the 'layout structure tree' (L-S-Tree), based on a nested segmentation procedure. The tool uses a sample-based approach for learning where concepts are learned by retaining samples and new documents are classified by matching their L-S-Trees with samples. The matching process involves both computing the edit distance between two trees using a previously developed pattern matching toolkit, and calculating the degree of conceptual closeness between the documents and samples. Our experimental results show that the tool is capable of classifying various types of office documents, even with very few samples in the sample base.

Identifier

0027810377 (Scopus)

ISBN

[0818642009]

Publication Title

Proceedings of the International Conference on Tools with Artificial Intelligence

ISSN

10636730

First Page

427

Last Page

434

This document is currently not available here.

Share

COinS