Mining genes in DNA using genescout
Document Type
Conference Proceeding
Publication Date
12-1-2002
Abstract
In this paper, we present a new system, called GeneScout, for predicting gene structures in vertebrate genomic DNA. The system contains specially designed hidden Markov models (HMMs) for detecting functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites, etc. Our main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to that of analyzing the paths in the graph G. A dynamic programming algorithm is used to find the optimal path in G. The proposed system is trained using an expectation-maximization (EM) algorithm and its performance on vertebrate gene prediction is evaluated using the 10-way cross-validation method. Experimental results show the good performance of the proposed system and its complementarity to a widely used gene detection system. © 2002 IEEE.
Identifier
78149338925 (Scopus)
ISBN
[0769517544, 9780769517544]
Publication Title
Proceedings IEEE International Conference on Data Mining Icdm
ISSN
15504786
First Page
733
Last Page
736
Recommended Citation
Yin, Michael M. and Wang, Jason T.L., "Mining genes in DNA using genescout" (2002). Faculty Publications. 14529.
https://digitalcommons.njit.edu/fac_pubs/14529
