Document Type

Thesis

Date of Award

Fall 1-31-1999

Degree Name

Master of Science in Computer Science - (M.S.)

Department

Computer and Information Science

First Advisor

Jason T. L. Wang

Second Advisor

James M. Calvin

Third Advisor

Franz J. Kurfess

Abstract

A biomolecular object, such as a deoxyribonucleic acid (DNA), a ribonucleic acid (RNA) or a protein molecule, is made up of a long chain of subunits. A protein is represented as a sequence made from 20 different amino acids, each represented as a letter. There are a vast number of ways in which similar structural domains can be generated in proteins by different amino acid sequences. By contrast, the structure of DNA, made up of only four different nucleotide building blocks that occur in two pairs, is relatively simple, regular, and predictable.

Biomolecular sequence alignment/string search is the most important issue and challenging task in many areas of science and information processing. It involves identifying one-to-one correspondences between subunits of different sequences. An efficient algorithm or tool is involved with many important factors, these include the following: Scoring systems, Alignment statistics, Database redundancy and sequence repetitiveness.

Sequence "motifs" are derived from multiple alignments and can be used to examine individual sequences or an entire database for subtle patterns. With motifs, it is sometimes possible to detect distant relationships that may not be demonstrable based on comparisons of primary sequences alone.

A more comprehensive solution to the efficient string search is approached by building a small, representative set of motifs and using this as a screening database with automatic masking of matching query subsequences. This technology is still under development but recent studies indicate that a representative set of only 1,000 - 3,000 sequences may suffice and such a database can be searched in seconds.

Recommended Citation

Chuang, Wei-Jen, "A comparative study of sequence analysis tools in computational biology" (1999). Theses. 843.
https://digitalcommons.njit.edu/theses/843

Download

Included in

Computer Sciences Commons

COinS

Theses

A comparative study of sequence analysis tools in computational biology

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Theses

A comparative study of sequence analysis tools in computational biology

Author

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links