Document Type
Dissertation
Date of Award
Spring 5-31-2009
Degree Name
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Department
Computer Science
First Advisor
Jason T. L. Wang
Second Advisor
Bin Tian
Third Advisor
Narain Gehani
Fourth Advisor
James A. McHugh
Fifth Advisor
Marvin K. Nakayama
Abstract
The RNA molecules play various important roles in the cell and their functionality depends not only on the sequence information but to a large extent on their structure. The development of computational and predictive approaches to study RNA molecules is extremely valuable. In this research, a tool named RADAR was developed that provides a multitude of functionality for RNA data analysis and research. It aligns structure annotated RNA sequences so that both the sequence as well as structure information is taken into consideration. This tool is capable of performing pair-wise structure alignment, multiple structure alignment, database search and clustering. In addition, it provides two salient features: (i) constrained alignment of RNA secondary structures, and (ii) prediction of consensus structure for a set of RNA sequences. This tool is also hosted on the web and can be freely accessed and the software can be downloaded from http://datalab.njitedu/biodata/rna/RSmatch/server.htm . The RADAR software has been applied to various datasets (genomes of various mammals, viruses and parasites) and our experimental results show that this approach is capable of detecting functionally important regions.
As an application of RADAR, a systematic data mining approach was developed, termed GLEAN-UTR, to identify small stem loop RNA structure elements in the Untranslated regions (UTRs) that are conserved between human and mouse orthologs and exist in multiple genes with common Gene Ontology terms. This study resulted in 90 distinct RNA structure groups containing 748 structures, with 3' Histone stem loop (HSL3) and Iron Response element (IRE) among the top hits.
Further, the role played by structure in mRNA polyadenylation was investigated. Polyadenylation is an important step towards the maturation of almost all cellular mRNAs in eukaryotes. Studies have identified several cis-elements besides the widely known polyadenylation signal (PAS) element (AATAAA or ATTAAA or a close variant) which may have a role to play in poly(A) site identification. In this study the differences in structural stability of sequences surrounding poly(A) sites was investigated and it was found that for the genes containing single poly(A) site, the surrounding sequence is most stable as compared with the surrounding sequences for alternative poly(A) sites. This indicates that structure may be providing a evolutionary advantage for single poly(A) sites that prevents multiple poly(A) sites from arising. In addition the study found that the structural stability of the region surrounding a polyadenylation site correlates with its distance from the next gene. The shortest distance corresponding to a greater structural stability.
Recommended Citation
Khaladkar, Mugdha, "A bioinformatics framework for RNA structure mining, motif discovery and polyadenylation analysis" (2009). Dissertations. 907.
https://digitalcommons.njit.edu/dissertations/907