Document Type


Date of Award

Spring 5-31-2006

Degree Name

Master of Science in Computer Science - (M.S.)


Computer Science

First Advisor

Barry Cohen

Second Advisor

Yehoshua Perl

Third Advisor

Michael Halper


Due to the knowledge discovery process of gene information, details such as organism and chromosomal location should be known. Furthermore, the extensive biomedical research in genomics led to discovery of processes and diseases in which a gene plays a role. In the Gene hierarchy of the NCI Thesaurus (NCIT) such knowledge is represented by appropriate roles. However, upon review of the Gene hierarchy of the NCIT, many role errors are found. Realizing that such details are provided by another knowledge repository of NIH, the NCBI gene database, a methodology is presented to use NCBI to discover role errors for the Gene hierarchy.

For this, a web crawler was developed to retrieve the knowledge from the NCBI, so it could be represented in a compatible way with the NCIT gene roles, to facilitate comparison. The most difficult challenge is with process roles for which one gene is playing role in several processes and thus several targets exist for one gene. A procedure is developed to explore process role errors in the Gene hierarchy of the NCIT utilizing the Biological Process hierarchy of the NCIT and NCBI gene database.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.