Document Type
Thesis
Date of Award
5-31-2024
Degree Name
Master of Science in Data Science - (M.S.)
Department
Data Science
First Advisor
James Geller
Second Advisor
Lijing Wang
Third Advisor
Akshay Rangamani
Abstract
Generative Artificial Intelligence has recently garnered enormous attention for a varied number of reasons, one of them being its capacity to automate complex tasks. As the potential for integrating Generative Artificial Intelligence across technology, media, and healthcare grows, many tasks previously reliant on manual or algorithm-intensive methods can be simplified significantly. One such challenge is concept extraction, which is a labor-intensive process, especially when performed manually for building robust ontologies. It involves analyzing text to identify and extract relevant concepts, key phrases, entities, and relationships between these entities. This thesis explores various challenges of concept extraction from the ABCD (Adolescent Brain Cognitive Development) database focusing on extracting concepts pertaining to Social Determinants of Health. The thesis will address the issues of language-specific text separation for data processing, fine-tuning of GPT models, and tools to find semantic similarity. The thesis aims to enhance the understanding and application of Generative Artificial Intelligence in refining the existing concept extraction processes.
Recommended Citation
Singh, Niharika, "Optimizing social determinant of health concept extraction" (2024). Theses. 2592.
https://digitalcommons.njit.edu/theses/2592