Date of Award
Master of Science in Computer Science - (M.S.)
Due to the knowledge discovery process of gene information, details such as organism and chromosomal location should be known. Furthermore, the extensive biomedical research in genomics led to discovery of processes and diseases in which a gene plays a role. In the Gene hierarchy of the NCI Thesaurus (NCIT) such knowledge is represented by appropriate roles. However, upon review of the Gene hierarchy of the NCIT, many role errors are found. Realizing that such details are provided by another knowledge repository of NIH, the NCBI gene database, a methodology is presented to use NCBI to discover role errors for the Gene hierarchy.
For this, a web crawler was developed to retrieve the knowledge from the NCBI, so it could be represented in a compatible way with the NCIT gene roles, to facilitate comparison. The most difficult challenge is with process roles for which one gene is playing role in several processes and thus several targets exist for one gene. A procedure is developed to explore process role errors in the Gene hierarchy of the NCIT utilizing the Biological Process hierarchy of the NCIT and NCBI gene database.
Oren, Marc, "Finding role errors of the NCIT gene hierarchy using the NCBI" (2006). Theses. 432.