Date of Award

Spring 1996

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer and Information Science

First Advisor

Jason T. L. Wang

Second Advisor

James A. McHugh

Third Advisor

David Nassimi

Fourth Advisor

Peter A. Ng

Fifth Advisor

Wen-Syan Li

Abstract

Sequence databases comprise sequence data, which are linear structural descriptions of many natural entities. Approximate pattern discovery in a sequence database can lead to important conclusions or prediction of new phenomena. Traditional database technology is not suitable for accomplishing the task, and new techniques need to be developed.

In this dissertation, we propose several new techniques for discovering patterns in sequence databases. Our techniques incorporate pattern matching algorithms and novel heuristics for discovery and optimization. Experimental results of applying the techniques to both generated data and DNA/proteins show the effectiveness of the proposed techniques.

We then develop several classifiers using our pattern discovery algorithms and a previously published fingerprint technique. When we apply the classifiers to classify DNA and protein sequences, they give information that is complementary to the best classifiers available today.

Share

COinS