Document Type
Thesis
Date of Award
Spring 5-31-2006
Degree Name
Master of Science in Computational Biology - (M.S.)
Department
Computer Science
First Advisor
Jason T. L. Wang
Second Advisor
Qun Ma
Third Advisor
Vincent Oria
Abstract
SYSTERS is a biological information integration system containing protein sequences from many protein databases such as Swiss-Prot and TrEMBL and also protein sequences from complete genomes available at Ensembl, The Arabidopsis Information Resource, SGD and GeneDB. For some protein sequences their encoding nucleotide sequences can be found in their corresponding websites. However, for some protein sequences their encoding nucleotide sequences are missing.
The goal of this thesis is to. collect all nucleotide sequences for the protein sequences in SYSTERS and store them in a common database. There are two cases. The first case is that if the nucleotide sequences can be found, we collect them and put them in our database. The second case is that if the nucleotide sequences are missing, we use back-translation and use TBLASTN to search the nucleotide sequences and store them in our database.
Recommended Citation
Lokhandwala, Munira, "A data gathering toolkit for biological information integration" (2006). Theses. 1713.
https://digitalcommons.njit.edu/theses/1713