Effective hidden Markov models for detecting splicing junction sites in DNA sequences

Document Type

Conference Proceeding

Publication Date

11-1-2001

Abstract

Identification or prediction of coding sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. In this paper, we develop effective hidden Markov models (HMMs) to represent the consensus and degeneracy features of splicing junction sites in eukaryotic genes. Our HMM system based on the developed HMMs is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system can correctly detect 92% of the true donor sites and 91.5% of the true acceptor sites in the test data set containing real vertebrate gene sequences. These results suggest that our approach provide a useful tool in discovering the splicing junction sites in eukaryotic genes. © 2001 Elsevier Science Inc. All rights reserved.

Identifier

0035504952 (Scopus)

Publication Title

Information Sciences

External Full Text Location

https://doi.org/10.1016/S0020-0255(01)00160-8

ISSN

00200255

First Page

139

Last Page

163

Issue

1-2

Volume

139

This document is currently not available here.

Share

COinS