Document Type


Date of Award

Summer 8-31-2002

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)


Computer and Information Science

First Advisor

Jason T. L. Wang

Second Advisor

James A. McHugh

Third Advisor

Frank Y. Shih

Fourth Advisor

Vincent Oria

Fifth Advisor

Xiaoan Ruan


This dissertation research is targeted toward developing effective and accurate methods for identifying gene structures in the genomes of high eukaryotes, such as vertebrate organisms. Several effective hidden Markov models (HMMs) are developed to represent the consensus and degeneracy features of the functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites in vertebrate genes. The HMM system based on the developed models is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system achieves high sensitivity and specificity in detecting the functional sites.

This HMM system is then incorporated into a new gene detection system, called GeneScout. The main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to the analysis of paths in the graph G. A dynamic programming algorithm is employed by GeneScout to find the optimal path in G. Experimental results on the standard test dataset collected by Burset and Guigo indicate that GeneScout is comparable to existing gene discovery tools and complements the widely used GenScan system.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.