Quality assurance of complex ChEBI concepts based on number of relationship types
Document Type
Article
Publication Date
1-1-2019
Abstract
The Chemical Entities of Biological Interest (ChEBI) ontology is an important reference for applications dealing with chemical annotations and data mining. Modeling errors and inconsistencies in the large and complex ChEBI ontology are unavoidable. The errors can adversely affect applications dependent on it. We present a quality assurance (QA) methodology based on the correspondence between a concept's number of errors and its number of distinct relationship types - an intuitive measure of complexity. Specifically, we hypothesize that concepts with more relationship types tend to concentrate more errors. A study is carried out to assess the hypothesis. Two domain experts reviewed the correctness of a random sample of ChEBI concepts and formed a QA consensus report, which was then reviewed by a ChEBI curator. A two-tailed Fisher's exact test is performed on the consensus report and the curator's report to test the hypothesis. Various kinds of errors, including errors of both a relationship and non-relationship nature, were discovered and reported to the ChEBI's curator, who confirmed and corrected 65.8% of them. Our hypothesis was confirmed with statistical significance for both the domain experts' and the curator's reviews. Thus, ChEBI curators should employ a QA methodology concentrating on concepts with many relationship types.
Identifier
85071432044 (Scopus)
Publication Title
Applied Ontology
External Full Text Location
https://doi.org/10.3233/AO-190211
e-ISSN
18758533
ISSN
15705838
First Page
199
Last Page
214
Issue
3
Volume
14
Grant
R01CA190779
Fund Ref
National Institutes of Health
Recommended Citation
Yumak, Hasan; Zheng, Ling; Chen, Ling; Halper, Michael; Perl, Yehoshua; and Owen, Gareth, "Quality assurance of complex ChEBI concepts based on number of relationship types" (2019). Faculty Publications. 8102.
https://digitalcommons.njit.edu/fac_pubs/8102
