Date of Award
Master of Science in Computer Science - (M.S.)
Computer and Information Science
Jason T. L. Wang
James A. McHugh
Peter A. Ng
Multiple sequence alignment has proven to be a successful method of representing and organizing of protein sequence data. It is crucial to medical researches on the structure and function of proteins.
There have been numerous tools published on how to abstract meaningful relationship from an unknown sequence and a set of known sequences. One study used a method for discovering active motifs in a set of related protein sequences. These are meaningful knowledge abstracted from the known protein database since most protein families are characterized by multiple local motifs. Another study abstracts knowledge regarding the input sequence using a preconstructed algorithm from a set of sequences.
Most of these studies of classification processes use statistically optimized heuristics to enhance their accompanying algorithms. Therefore, these algorithms can be analyzed for statistical significance using Baysian Theorems.
Shih, Tom Tien-Hua, "Testing statistical significance in sequence classification algorithms" (1997). Theses. 1034.