Author ORCID Identifier

0000-0003-4004-6508

Document Type

Dissertation

Date of Award

8-31-2024

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

James Geller

Second Advisor

Yehoshua Perl

Third Advisor

Senjuti Basu Roy

Fourth Advisor

Shantanu Sharma

Fifth Advisor

Lijing Wang

Sixth Advisor

Zhe He

Abstract

Electronic Health Records (EHRs) have been widely used in healthcare to record demographics, vital signs, test results, immunizations, medical imaging reports, differential diagnoses, etc. It is now accepted that non-clinical (e.g., social) factors have a substantial influence on health outcomes. Hence, it is desirable to record these Social and Commercial Determinants of Health (SDoH & CDoH) in an EHR. The "non-text parts" of EHR notes (e.g., data tables) rely on coded terms from underlying ontologies or terminologies to facilitate semantic interoperability. Ontologies help define concepts, the relationships between them, and instances that can be utilized in research.

The first accomplishment of this dissertation is the development of four ontologies covering elements of SDoH and CDoH: i) Health Ontology for Minority Equity (HOME); ii) Social Determinant of Health Ontology (SOHO); iii) Commercial Determinants of Health Ontology (CDoH); iv) Non-clinical Determinants of Health Ontology (N-CDoH). These ontologies are designed to improve the representation of clinical/social data, to address gaps in existing reference ontologies and terminologies, and to capture fine granularity concepts to be recorded in EHRs.

Ontology evaluation is defined as the process of determining the quality of an ontology considering a set of evaluation criteria. A major step in the ontology lifecycle is this evaluation for consistency, coherence, and semantic correctness. This dissertation presents a methodology for human expert evaluation, analyzing whether the developed ontology covers the knowledge of the domain under consideration correctly and to a sufficient degree.

After developing those ontologies, the next important task addressed in this dissertation is developing methods for semi-automatic enrichment of their contents. With the advent of Large Language Models (LLM), this dissertation demonstrates the possibility of using LLM to enrich ontologies by extracting concepts and semantic triples from a major repository of medical research articles called PubMed.

Next, the dissertation presents the application of an ontology to two important NLP tasks, 1) Hyperparameter optimization (of a Neural Network model) for text classification, and 2) Clinical Named Entity Recognition (NER). In application 1), the goal is to identify the samples from a large set of clinical text notes that express a sentiment of social determination of health about a specific patient in an EHR. Genetic algorithm-based hyperparameter optimization is used to identify optimal hyperparameters. In application 2), preliminary studies revealed that reference ontologies and terminologies do not contain many of the frequently recorded fine granularity concepts in EHR notes. This dissertation demonstrates the enrichment of a Cardiology Interface Terminology (CIT) dedicated to highlighting EHR notes of cardiology patients using the Clinical-Named Entity Recognition (Clinical NER) approach.

Finally, this dissertation also demonstrates the dangers of re-identification of medical data by LLMs while performing a simple text classification task using "quantized versions" of Llama 2, Flan, Mistral, and Vicuna, four popular LLMs.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.