Document Type


Date of Award

Fall 1-31-2005

Degree Name

Master of Science in Computational Biology - (M.S.)


Computer Science

First Advisor

Carol A. Venanzi

Second Advisor

Michael Recce

Third Advisor

Qun Ma


Quantitative Structure-Activity Relationship (QSAR) analysis attempts to develop a predictive model of biological activity based on molecular descriptors. 2D QSAR uses descriptors, such as topological indices, that are independent of molecular conformation. A genetic algorithm - partial least squares (GA-PLS) approach was used to identify the molecular descriptors that correlate to the biological activity (binding affinity) of a set of 80 methylphenidate analogues and to construct a predictive model. The GA code was implemented using the fitness function (1-(n-1)(1-q2)/ (n - c)), where n is the number of compounds, c is the optimal number of components, and q2 is the cross-validated regression coefficient. Partial Least Squares Regression was then applied to the selected descriptors to create a predictive model of biological activity (q2 = 0.78, fitness = 0.77). This model can be used to assist in the design of improved methylphenidate analogues for the treatment of cocaine abuse. The GA-PLS program was tested on the benchmark Selwood dataset of antifilarial antimycin analogues and identified several molecular descriptors in common with other 2D QSAR models.