Date of Award

Spring 2013

Document Type

Thesis

Degree Name

Master of Science in Bioinformatics - (M.S.)

Department

Computer Science

First Advisor

Zhi Wei

Second Advisor

Usman W. Roshan

Third Advisor

Egbert Ammicht

Abstract

Aligning millions of short reads to a reference genome is a critical task in high throughput sequencing. In recent years, a large number of mapping algorithms have been developed, all of which have in common that they align a vast number of reads to genomic or transcriptomic sequences. RNA-Seq data is discrete in nature, therefore with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. To provide guidance in the choice of alignment algorithms, five different alignment tools for RNA-Seq data are evaluated. In order to compare the accuracy and sensitivity of the Bowtie, Bowtie2, GMAP, Tophat and GNUMAP tools, their alignment accuracy for approximately 1 million simulated reads of chromosome one was evaluated using these five alignment tools. Bowtie has the highest accuracy, which is 92.42%, while GMAP has the lowest, which is 49.63%. Tophat has the highest sensitivity , which is 71.35% , while GMAP has the lowest, which is 51.69%.

Share

COinS