Date of Award

Fall 2013

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

James Geller

Second Advisor

Yehoshua Perl

Third Advisor

Mei Liu

Fourth Advisor

Michael Halper

Fifth Advisor

Gai Elhanan

Sixth Advisor

Chunhua Weng

Abstract

Biomedical terminologies and ontologies underlie various Health Information Systems (HISs), Electronic Health Record (EHR) Systems, Health Information Exchanges (HIEs) and health administrative systems. Moreover, the proliferation of interdisciplinary research efforts in the biomedical field is fueling the need to overcome terminological barriers when integrating knowledge from different fields into a unified research project. Therefore well-developed and well-maintained terminologies are in high demand. Most of the biomedical terminologies are large and complex, which makes it impossible for human experts to manually detect and correct all errors and inconsistencies. Automated and semi-automated Quality Assurance methodologies that focus on areas that are more likely to contain errors and inconsistencies are therefore important.

In this dissertation, structural and semantic methodologies are used to enhance biomedical terminologies. The dissertation work is divided into three major parts. The first part consists of structural auditing techniques for the Semantic Network of the Unified Medical Language System (UMLS), which serves as a vocabulary knowledge base for biomedical research in various applications. Research techniques are presented on how to automatically identify and prevent erroneous semantic type assignments to concepts. The Web-based adviseEditor system is introduced to help UMLS editors to make correct multiple semantic type assignments to concepts. It is made available to the National Library of Medicine for future use in maintaining the UMLS.

The second part of this dissertation is on how to enhance the conceptual content of SNOMED CT by methods of semantic harmonization. By 2015, SNOMED will become the standard terminology for EH R encoding of diagnoses and problem lists. In order to enrich the semantics and coverage of SNOMED CT for clinical and research applications, the problem of semantic harmonization between SNOMED CT and six reference terminologies is approached by 1) comparing the vertical density of SNOM ED CT with the reference terminologies to find potential concepts for export and import; and 2) categorizing the relationships between structurally congruent concepts from pairs of terminologies, with SNOMED CT being one terminology in the pair. Six kinds of configurations are observed, e.g., alternative classifications, and suggested synonyms. For each configuration, a corresponding solution is presented for enhancing one or both of the terminologies.

The third part applies Quality Assurance techniques based on “Abstraction Networks” to biomedical ontologies in BioPortal. The National Center for Biomedical Ontology provides B ioPortal as a repository of over 350 biomedical ontologies covering a wide range of domains. It is extremely difficult to design a new Quality Assurance methodology for each ontology in BioPortal. Fortunately, groups of ontologies in BioPortal share common structural features. Thus, they can be grouped into families based on combinations of these features. A uniform Quality Assurance methodology design for each family will achieve improved efficiency, which is critical with the limited Quality Assurance resources available to most ontology curators. In this dissertation, a family-based framework covering 186 BioPortal ontologies and accompanying Quality Assurance methods based on abstraction networks are presented to tackle this problem.

Share

COinS