Dissertations

A methodological framework for ontology development, enrichment, and application in natural language processing tasks

Navya Martin Kollapally, New Jersey Institute of TechnologyFollow

Author ORCID Identifier

0000-0003-4004-6508

Document Type

Dissertation

Date of Award

8-31-2024

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

James Geller

Second Advisor

Yehoshua Perl

Third Advisor

Senjuti Basu Roy

Fourth Advisor

Shantanu Sharma

Fifth Advisor

Lijing Wang

Sixth Advisor

Zhe He

Abstract

Electronic Health Records (EHRs) have been widely used in healthcare to record demographics, vital signs, test results, immunizations, medical imaging reports, differential diagnoses, etc. It is now accepted that non-clinical (e.g., social) factors have a substantial influence on health outcomes. Hence, it is desirable to record these Social and Commercial Determinants of Health (SDoH & CDoH) in an EHR. The "non-text parts" of EHR notes (e.g., data tables) rely on coded terms from underlying ontologies or terminologies to facilitate semantic interoperability. Ontologies help define concepts, the relationships between them, and instances that can be utilized in research.

The first accomplishment of this dissertation is the development of four ontologies covering elements of SDoH and CDoH: i) Health Ontology for Minority Equity (HOME); ii) Social Determinant of Health Ontology (SOHO); iii) Commercial Determinants of Health Ontology (CDoH); iv) Non-clinical Determinants of Health Ontology (N-CDoH). These ontologies are designed to improve the representation of clinical/social data, to address gaps in existing reference ontologies and terminologies, and to capture fine granularity concepts to be recorded in EHRs.

Ontology evaluation is defined as the process of determining the quality of an ontology considering a set of evaluation criteria. A major step in the ontology lifecycle is this evaluation for consistency, coherence, and semantic correctness. This dissertation presents a methodology for human expert evaluation, analyzing whether the developed ontology covers the knowledge of the domain under consideration correctly and to a sufficient degree.

After developing those ontologies, the next important task addressed in this dissertation is developing methods for semi-automatic enrichment of their contents. With the advent of Large Language Models (LLM), this dissertation demonstrates the possibility of using LLM to enrich ontologies by extracting concepts and semantic triples from a major repository of medical research articles called PubMed.

Next, the dissertation presents the application of an ontology to two important NLP tasks, 1) Hyperparameter optimization (of a Neural Network model) for text classification, and 2) Clinical Named Entity Recognition (NER). In application 1), the goal is to identify the samples from a large set of clinical text notes that express a sentiment of social determination of health about a specific patient in an EHR. Genetic algorithm-based hyperparameter optimization is used to identify optimal hyperparameters. In application 2), preliminary studies revealed that reference ontologies and terminologies do not contain many of the frequently recorded fine granularity concepts in EHR notes. This dissertation demonstrates the enrichment of a Cardiology Interface Terminology (CIT) dedicated to highlighting EHR notes of cardiology patients using the Clinical-Named Entity Recognition (Clinical NER) approach.

Finally, this dissertation also demonstrates the dangers of re-identification of medical data by LLMs while performing a simple text classification task using "quantized versions" of Llama 2, Flan, Mistral, and Vicuna, four popular LLMs.

Recommended Citation

Kollapally, Navya Martin, "A methodological framework for ontology development, enrichment, and application in natural language processing tasks" (2024). Dissertations. 1777.
https://digitalcommons.njit.edu/dissertations/1777

Download

Included in

Computer Sciences Commons, Data Science Commons

COinS

Dissertations

A methodological framework for ontology development, enrichment, and application in natural language processing tasks

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Sixth Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations

A methodological framework for ontology development, enrichment, and application in natural language processing tasks

Author

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Sixth Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links