Document Type
Thesis
Date of Award
Fall 1-31-1999
Degree Name
Master of Science in Computer Science - (M.S.)
Department
Computer and Information Science
First Advisor
Jason T. L. Wang
Second Advisor
James M. Calvin
Third Advisor
Franz J. Kurfess
Abstract
A biomolecular object, such as a deoxyribonucleic acid (DNA), a ribonucleic acid (RNA) or a protein molecule, is made up of a long chain of subunits. A protein is represented as a sequence made from 20 different amino acids, each represented as a letter. There are a vast number of ways in which similar structural domains can be generated in proteins by different amino acid sequences. By contrast, the structure of DNA, made up of only four different nucleotide building blocks that occur in two pairs, is relatively simple, regular, and predictable.
Biomolecular sequence alignment/string search is the most important issue and challenging task in many areas of science and information processing. It involves identifying one-to-one correspondences between subunits of different sequences. An efficient algorithm or tool is involved with many important factors, these include the following: Scoring systems, Alignment statistics, Database redundancy and sequence repetitiveness.
Sequence "motifs" are derived from multiple alignments and can be used to examine individual sequences or an entire database for subtle patterns. With motifs, it is sometimes possible to detect distant relationships that may not be demonstrable based on comparisons of primary sequences alone.
A more comprehensive solution to the efficient string search is approached by building a small, representative set of motifs and using this as a screening database with automatic masking of matching query subsequences. This technology is still under development but recent studies indicate that a representative set of only 1,000 - 3,000 sequences may suffice and such a database can be searched in seconds.
Recommended Citation
Chuang, Wei-Jen, "A comparative study of sequence analysis tools in computational biology" (1999). Theses. 843.
https://digitalcommons.njit.edu/theses/843