Concept placement using BERT trained by transforming and summarizing biomedical ontology structure
Document Type
Article
Publication Date
12-1-2020
Abstract
The comprehensive modeling and hierarchical positioning of a new concept in an ontology heavily relies on its set of proper subsumption relationships (IS-As) to other concepts. Identifying a concept's IS-A relationships is a laborious task requiring curators to have both domain knowledge and terminology skills. In this work, we propose a method to automatically predict the presence of IS-A relationships between a new concept and pre-existing concepts based on the language representation model BERT. This method converts the neighborhood network of a concept into “sentences” and harnesses BERT's Next Sentence Prediction (NSP) capability of predicting the adjacency of two sentences. To augment our method's performance, we refined the training data by employing an ontology summarization technique. We trained our model with the two largest hierarchies of the SNOMED CT 2017 July release and applied it to predicting the parents of new concepts added in the SNOMED CT 2018 January release. The results showed that our method achieved an average F1 score of 0.88, and the average Recall score improves slightly from 0.94 to 0.96 by using the ontology summarization technique.
Identifier
85094138781 (Scopus)
Publication Title
Journal of Biomedical Informatics
External Full Text Location
https://doi.org/10.1016/j.jbi.2020.103607
ISSN
15320464
PubMed ID
33098987
Volume
112
Grant
UL1TR003017
Fund Ref
National Institutes of Health
Recommended Citation
Liu, Hao; Perl, Yehoshua; and Geller, James, "Concept placement using BERT trained by transforming and summarizing biomedical ontology structure" (2020). Faculty Publications. 4758.
https://digitalcommons.njit.edu/fac_pubs/4758