Date of Award

Summer 2010

Document Type


Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)


Computer Science

First Advisor

Barry Cohen

Second Advisor

Usman W. Roshan

Third Advisor

Narain Gehani

Fourth Advisor

Vincent Oria

Fifth Advisor

Michael Halper


Natural selection may occur at multiple levels of the biological hierarchy, including at the molecular level. It may occur on any phenotypic trait that evidences variation and that is heritable. This research uses computational methods to investigate whether the stability of the secondary structures of mRNAs has been the subject of natural selection.

The DNA sequence that codes for a particular target protein is only partially determined by that protein, since the redundancy of the genetic code permits multiple possible synonymous codons for each peptide. An RNA transcript of a DNA protein template (gene) folds back on itself through complementary base pairing, resulting in an mRNA secondary structure. This mRNA secondary structure tends to have a configuration that minimizes free energy. Two synonymous mRNAs, coding for the identical protein with different sets of synonymous codons, will in general fold into different secondary structures with different minimum free energies (MFEs). The secondary structure of an mRNA is therefore a phenotypic trait that could be a target of natural selection.

Several related questions were investigated: 1) Is there natural selection on the stability of RNA secondary structure, across various types of organisms? 2) Does the MFE of microbial mRNAs correlate with the function of the target protein? 3) Is there evidence of natural selection on the nucleotide composition and/or secondary structure of the prefixes and suffixes of bacterial mRNAs? 4) Is there natural selection on the secondary structures and substructures of subviral RNAs?

These questions were investigated using large-scale simulations, based on the generation of sets of randomized synthetic mRNAs for particular genes. The secondary structure of each mRNA (naturally occuring and synthetic) was then computationally predicted. The experiments were performed on the complete sets of genes of a number of prokaryotes

and eukaryotes. Two types of randomized experiments were performed on each genetic data set, providing an independent confirmation of the results. In the first method of randomization, synonymous mRNAs were generated for each gene, creating sequences that code for the identical protein, with a frequency of codon use characteristic of the organism. In the second method of randomization, the nucleotides of the mRNA were permuted in manner that does not preserve the mRNA sequence's target protein, but exactly preserves the mRNA sequence's nucleotide and dinucleotide frequencies.

The MFE of each naturally occuring mRNA sequence is then compared with the MFEs of the corresponding randomized sequences. A pattern of deviation, across an entire organism, of the value of the MFE of the naturally occurring sequence from that of the corresponding randomized sequences is evidence of natural selection on the stability of the mRNA transcript.

This research establishes that:

  1. In all prokaryotes studied, natural selection has favored of highly stable (lower MFE) mRNAs. In some prokaryotes, natural selection has also favored highly unstable mRNAs. No statistically significant evidence of such selection was found in eukaryotes.
  2. The distributions of MFEs of mRNAs of 25 broad functional classes of proteins (COGs - Clusters of Orthologous Groups) of five microbes and yeast correlate to functional class.
  3. mRNA prefixes have a distinctive MFE signature. The naturally occurring prefixes display more structure, on average, than randomized sequences with identical nucleotide and dinucleotide content, suggesting that natural selection favors secondary structure in the prefix of mRNA.
  4. Viroids (with RNA genomes) have highly stable secondary structures and the structures are similar among the viroids belonging to the same family.

The results indicate that natural selection on the MFE of mRNA is widespread in the evolution of the genome.