Document Type


Date of Award

Fall 10-31-1997

Degree Name

Master of Science in Computer Science - (M.S.)


Computer and Information Science

First Advisor

Jason T. L. Wang

Second Advisor

James A. McHugh

Third Advisor

Peter A. Ng


Multiple sequence alignment has proven to be a successful method of representing and organizing of protein sequence data. It is crucial to medical researches on the structure and function of proteins.

There have been numerous tools published on how to abstract meaningful relationship from an unknown sequence and a set of known sequences. One study used a method for discovering active motifs in a set of related protein sequences. These are meaningful knowledge abstracted from the known protein database since most protein families are characterized by multiple local motifs. Another study abstracts knowledge regarding the input sequence using a preconstructed algorithm from a set of sequences.

Most of these studies of classification processes use statistically optimized heuristics to enhance their accompanying algorithms. Therefore, these algorithms can be analyzed for statistical significance using Baysian Theorems.