Document Type
Thesis
Date of Award
Spring 5-31-2006
Degree Name
Master of Science in Computer Science - (M.S.)
Department
Computer Science
First Advisor
Barry Cohen
Second Advisor
Yehoshua Perl
Third Advisor
Michael Halper
Abstract
Due to the knowledge discovery process of gene information, details such as organism and chromosomal location should be known. Furthermore, the extensive biomedical research in genomics led to discovery of processes and diseases in which a gene plays a role. In the Gene hierarchy of the NCI Thesaurus (NCIT) such knowledge is represented by appropriate roles. However, upon review of the Gene hierarchy of the NCIT, many role errors are found. Realizing that such details are provided by another knowledge repository of NIH, the NCBI gene database, a methodology is presented to use NCBI to discover role errors for the Gene hierarchy.
For this, a web crawler was developed to retrieve the knowledge from the NCBI, so it could be represented in a compatible way with the NCIT gene roles, to facilitate comparison. The most difficult challenge is with process roles for which one gene is playing role in several processes and thus several targets exist for one gene. A procedure is developed to explore process role errors in the Gene hierarchy of the NCIT utilizing the Biological Process hierarchy of the NCIT and NCBI gene database.
Recommended Citation
Oren, Marc, "Finding role errors of the NCIT gene hierarchy using the NCBI" (2006). Theses. 432.
https://digitalcommons.njit.edu/theses/432