Date of Award

Spring 5-31-2006

Document Type

Thesis

Degree Name

Master of Science in Computational Biology - (M.S.)

Department

Computer Science

First Advisor

Jason T. L. Wang

Second Advisor

Qun Ma

Third Advisor

Vincent Oria

Abstract

SYSTERS is a biological information integration system containing protein sequences from many protein databases such as Swiss-Prot and TrEMBL and also protein sequences from complete genomes available at Ensembl, The Arabidopsis Information Resource, SGD and GeneDB. For some protein sequences their encoding nucleotide sequences can be found in their corresponding websites. However, for some protein sequences their encoding nucleotide sequences are missing.

The goal of this thesis is to. collect all nucleotide sequences for the protein sequences in SYSTERS and store them in a common database. There are two cases. The first case is that if the nucleotide sequences can be found, we collect them and put them in our database. The second case is that if the nucleotide sequences are missing, we use back-translation and use TBLASTN to search the nucleotide sequences and store them in our database.

Share

COinS