Date of Award

Fall 2005

Document Type

Thesis

Degree Name

Master of Science in Electrical Engineering - (M.S.)

Department

Electrical and Computer Engineering

First Advisor

Alexander Haimovich

Second Advisor

Ali N. Akansu

Third Advisor

Yun Q. Shi

Abstract

Deoxyribonucleic Acid (DNA) strand carries genetic information in the cell. A strand of DNA consists of nitrogenous molecules called nucleotides. Nucleotides triplets, or the codons, code for amino acids. There are two distinct regions in DNA, the gene and the intergenic DNA, or the junk DNA. Two regions can be distinguished in the gene- the exons, or the regions that code for amino acid, and the introns, or the regions that do not code for amino acid. The main aim of the thesis is to study signal processing techniques that help distinguish between the regions of the exons and the introns. Previous research has shown the fact that the exons can be considered as a sequence of signal and noise, whereas introns are noise-like sequences. Fourier Transform of an exonic sequence exhibits a peak at frequency sample value k N/3 where N is the length of the FFT transform. This property is referred to as the period -3 property. Unlike exons, introns have a noise-like spectrum. The factor that determines the performance efficiency of a transform is the figure of merit, defined as the ratio of the peak value to the arithmetic mean of all the values. A comparative study was conducted for the application of the Discrete Fourier Transform and the Karhunen Loeve Transform. Though both DFT and KLT of an exon sequence produce a higher figure of merit than that for an intron sequence, it is interesting to note that the difference in the figure of merits of exons and introns was higher when the KLT was applied to the sequence than when the DFT was applied. The two transforms were also applied on entire sequences in a sliding window fashion. Finally, the two transforms were applied on a large number of sequences from a variety of organisms. A Neyman Pearson based detector was used to obtain receiver operating curves, i.e., probability of detection versus probability of false alarm. When a transform is applied as a sliding window, the values for exons and introns are taken separately. The exons and the introns served as the two hypotheses of the detector. The Neyman Pearson detector helped indicate the fact the KLT worked better on a variety of organisms than the DFT.

Share

COinS