Document Type


Date of Award

Spring 5-31-2013

Degree Name

Master of Science in Bioinformatics - (M.S.)


Computer Science

First Advisor

Zhi Wei

Second Advisor

Usman W. Roshan

Third Advisor

Jason T. L. Wang


Alternative polyadenylation (APA) of mRNA plays a crucial role for post-transcriptional gene regulation. Recently, advances in next generation sequencing technology have made it possible to efficiently characterize the transcriptome and identify the 3’end of polyadenylated RNAs. However, no comprehensive bioi nformatic pipelines have fulfilled this goal. The PolyASeeker, a computational framework for identifying polyadenylation cleavage sites from RNA-Seq data is proposed in this thesis. By using the simulated RNA-seq dataset, a novel method is developed to evaluate the performance of the proposed framework versus the traditional A-stretch approach, and compute accurate Precisions and Recalls that previous estimation could not get. It is found that the proposed method is able to achieve significantly higher sensitivity in various scenarios than the A-stretch approach. In further studies, PolyASeeker is applied to human tissue- specific RNA-sequencing data, and through all the polyA sites identified by PolyASeeker and annotated by PolyA DB, special isoform expression patterns among tissues are found. Genes that have a specific 3’UTR expression have also been recognized in the brain. PolyASeeker is also run on an mRNA 3’ UTR sequencing dataset and it is found that the software could be quite adapted to the data. Significant isoform shorting events with expression evidences and experimental supports have been found.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.