Date of Award
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Computer and Information Science
Frank Y. Shih
Peter A. Ng
James A. McHugh
This dissertation presents document preprocessing and fuzzy unsupervised character classification for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-tone pictures. First, the block segmentation algorithm is performed based on a simple two-step run-length smoothing to decompose a document into single-mode blocks. Next, the block classification is performed based on the clustering rules to classify each block into one of the types such as text, horizontal or vertical lines, graphics, and pictures. The mean white-to-black transition is shown as an invariance for textual blocks, and is useful for block discrimination.
Chen, Shy-Shyan, "Document preprocessing and fuzzy unsupervised character classification" (1995). Dissertations. 1109.