Effective hidden Markov models for detecting splicing junction sites in DNA sequences
Document Type
Conference Proceeding
Publication Date
11-1-2001
Abstract
Identification or prediction of coding sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. In this paper, we develop effective hidden Markov models (HMMs) to represent the consensus and degeneracy features of splicing junction sites in eukaryotic genes. Our HMM system based on the developed HMMs is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system can correctly detect 92% of the true donor sites and 91.5% of the true acceptor sites in the test data set containing real vertebrate gene sequences. These results suggest that our approach provide a useful tool in discovering the splicing junction sites in eukaryotic genes. © 2001 Elsevier Science Inc. All rights reserved.
Identifier
0035504952 (Scopus)
Publication Title
Information Sciences
External Full Text Location
https://doi.org/10.1016/S0020-0255(01)00160-8
ISSN
00200255
First Page
139
Last Page
163
Issue
1-2
Volume
139
Recommended Citation
Yin, Michael M. and Wang, Jason T.L., "Effective hidden Markov models for detecting splicing junction sites in DNA sequences" (2001). Faculty Publications. 15088.
https://digitalcommons.njit.edu/fac_pubs/15088
