Date of Award
Doctor of Philosophy in Computing Sciences - (Ph.D.)
James A. McHugh
Advanced technologies have resulted in the generation of large amounts of data ("Big Data"). The Big Knowledge derived from Big Data could be beyond humans' ability of comprehension, which will limit the effective and innovative use of Big Knowledge repository. Biomedical ontologies, which play important roles in biomedical information systems, constitute one kind of Big Knowledge repository. Biomedical ontologies typically consist of domain knowledge assertions expressed by the semantic connections between tens of thousands of concepts. Without some high-level visual representation of Big Knowledge in biomedical ontologies, humans cannot grasp the "big picture" of those ontologies. Such Big Knowledge orientation is required for the proper maintenance of ontologies and their effective use. This dissertation is addressing the Big Knowledge challenge - How to enable humans to use Big Knowledge correctly and effectively (referred to as the "Big Knowledge to Use" (BK2U) problem) - with a focus on biomedical ontologies.
In previous work, Abstraction Networks (AbNs) have been demonstrated successful for the summarization, visualization and quality assurance (QA) of biomedical ontologies. Based on the previous research, this dissertation introduces new AbNs of various granularities for Big Knowledge summarization and extends the applications of AbNs. This dissertation consists of three main parts. The first part introduces two advanced AbNs. One is the weighted aggregate partial-area taxonomy with a parameter to flexibly control the summarization granularity. The second is the Ingredient Abstraction Network (IAbN) for the National Drug File - Reference Terminology (NDF-RT) Chemical Ingredients hierarchy, for which the previously developed AbNs for hierarchies with outgoing relationships, are not applicable. Since NDF-RT's Chemical Ingredients hierarchy has no outgoing relationships.
The second part describes applications of the two advanced AbNs. A study utilizing the weighted aggregate partial-area taxonomy for the identification of major topics in SNOMED CT's Specimen hierarchy is reported. A multi-layer interactive visualization system of required granularity for ontology comprehension, based on the weighted aggregate partial-area taxonomy, is demonstrated to comprehend the Neoplasm subhierarchy of National Cancer Institute thesaurus (NCIt). The IAbN is applied for drug-drug interaction (DDI) discovery.
The third part reports eight family-based QA studies on NCIt's Neoplasm, Gene, and Biological Process hierarchies, SNOMED CT's Infectious disease hierarchy, the Chemical Entities of Biological Interest ontology, and the Chemical Ingredients hierarchy in NDF-RT. There is no one-size-fits-all QA method and it is impossible to find a QA method for each individual ontology. Hence, family-based QA is an effective way, i.e., one QA technique could be applicable to a whole family of structurally similar ontologies. The results of these studies demonstrate that "complex concepts" and "uncommonly modeled concepts" are more likely to have errors. Furthermore, the three studies on overlapping concepts in partial-area taxonomies reported in this dissertation combined with previous three studies prove the success of "overlapping concepts" as a QA methodology for a whole family of 76 similar ontologies in BioPortal.
Zheng, Ling, "Applications of big knowledge summarization" (2018). Dissertations. 1385.