Document Type


Date of Award

Spring 5-31-2013

Degree Name

Master of Science in Bioinformatics - (M.S.)


Computer Science

First Advisor

Zhi Wei

Second Advisor

Usman W. Roshan

Third Advisor

Jason T. L. Wang


Alternative polyadenylation (APA) of mRNA plays a crucial role for post-transcriptional gene regulation. Recently, advances in next generation sequencing technology have made it possible to efficiently characterize the transcriptome and identify the 3’end of polyadenylated RNAs. However, no comprehensive bioi nformatic pipelines have fulfilled this goal. The PolyASeeker, a computational framework for identifying polyadenylation cleavage sites from RNA-Seq data is proposed in this thesis. By using the simulated RNA-seq dataset, a novel method is developed to evaluate the performance of the proposed framework versus the traditional A-stretch approach, and compute accurate Precisions and Recalls that previous estimation could not get. It is found that the proposed method is able to achieve significantly higher sensitivity in various scenarios than the A-stretch approach. In further studies, PolyASeeker is applied to human tissue- specific RNA-sequencing data, and through all the polyA sites identified by PolyASeeker and annotated by PolyA DB, special isoform expression patterns among tissues are found. Genes that have a specific 3’UTR expression have also been recognized in the brain. PolyASeeker is also run on an mRNA 3’ UTR sequencing dataset and it is found that the software could be quite adapted to the data. Significant isoform shorting events with expression evidences and experimental supports have been found.