Author ORCID Identifier
0000-0003-4004-6508
Document Type
Dissertation
Date of Award
8-31-2024
Degree Name
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Department
Computer Science
First Advisor
James Geller
Second Advisor
Yehoshua Perl
Third Advisor
Senjuti Basu Roy
Fourth Advisor
Shantanu Sharma
Fifth Advisor
Lijing Wang
Sixth Advisor
Zhe He
Abstract
Electronic Health Records (EHRs) have been widely used in healthcare to record demographics, vital signs, test results, immunizations, medical imaging reports, differential diagnoses, etc. It is now accepted that non-clinical (e.g., social) factors have a substantial influence on health outcomes. Hence, it is desirable to record these Social and Commercial Determinants of Health (SDoH & CDoH) in an EHR. The "non-text parts" of EHR notes (e.g., data tables) rely on coded terms from underlying ontologies or terminologies to facilitate semantic interoperability. Ontologies help define concepts, the relationships between them, and instances that can be utilized in research.
The first accomplishment of this dissertation is the development of four ontologies covering elements of SDoH and CDoH: i) Health Ontology for Minority Equity (HOME); ii) Social Determinant of Health Ontology (SOHO); iii) Commercial Determinants of Health Ontology (CDoH); iv) Non-clinical Determinants of Health Ontology (N-CDoH). These ontologies are designed to improve the representation of clinical/social data, to address gaps in existing reference ontologies and terminologies, and to capture fine granularity concepts to be recorded in EHRs.
Ontology evaluation is defined as the process of determining the quality of an ontology considering a set of evaluation criteria. A major step in the ontology lifecycle is this evaluation for consistency, coherence, and semantic correctness. This dissertation presents a methodology for human expert evaluation, analyzing whether the developed ontology covers the knowledge of the domain under consideration correctly and to a sufficient degree.
After developing those ontologies, the next important task addressed in this dissertation is developing methods for semi-automatic enrichment of their contents. With the advent of Large Language Models (LLM), this dissertation demonstrates the possibility of using LLM to enrich ontologies by extracting concepts and semantic triples from a major repository of medical research articles called PubMed.
Next, the dissertation presents the application of an ontology to two important NLP tasks, 1) Hyperparameter optimization (of a Neural Network model) for text classification, and 2) Clinical Named Entity Recognition (NER). In application 1), the goal is to identify the samples from a large set of clinical text notes that express a sentiment of social determination of health about a specific patient in an EHR. Genetic algorithm-based hyperparameter optimization is used to identify optimal hyperparameters. In application 2), preliminary studies revealed that reference ontologies and terminologies do not contain many of the frequently recorded fine granularity concepts in EHR notes. This dissertation demonstrates the enrichment of a Cardiology Interface Terminology (CIT) dedicated to highlighting EHR notes of cardiology patients using the Clinical-Named Entity Recognition (Clinical NER) approach.
Finally, this dissertation also demonstrates the dangers of re-identification of medical data by LLMs while performing a simple text classification task using "quantized versions" of Llama 2, Flan, Mistral, and Vicuna, four popular LLMs.
Recommended Citation
Kollapally, Navya Martin, "A methodological framework for ontology development, enrichment, and application in natural language processing tasks" (2024). Dissertations. 1777.
https://digitalcommons.njit.edu/dissertations/1777