Document Type


Date of Award

Spring 5-31-1996

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)


Computer and Information Science

First Advisor

Jason T. L. Wang

Second Advisor

James A. McHugh

Third Advisor

David Nassimi

Fourth Advisor

Peter A. Ng

Fifth Advisor

Wen-Syan Li


Sequence databases comprise sequence data, which are linear structural descriptions of many natural entities. Approximate pattern discovery in a sequence database can lead to important conclusions or prediction of new phenomena. Traditional database technology is not suitable for accomplishing the task, and new techniques need to be developed.

In this dissertation, we propose several new techniques for discovering patterns in sequence databases. Our techniques incorporate pattern matching algorithms and novel heuristics for discovery and optimization. Experimental results of applying the techniques to both generated data and DNA/proteins show the effectiveness of the proposed techniques.

We then develop several classifiers using our pattern discovery algorithms and a previously published fingerprint technique. When we apply the classifiers to classify DNA and protein sequences, they give information that is complementary to the best classifiers available today.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.