Histogram difference string distance for enhancing ontology integration in bioinformatics
Document Type
Conference Proceeding
Publication Date
12-1-2012
Abstract
Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.
Identifier
84883638335 (Scopus)
ISBN
[9781618397461]
Publication Title
4th International Conference on Bioinformatics and Computational Biology 2012 Bicob 2012
First Page
108
Last Page
113
Recommended Citation
Rudniy, Alex; Geller, James; and Song, Min, "Histogram difference string distance for enhancing ontology integration in bioinformatics" (2012). Faculty Publications. 17943.
https://digitalcommons.njit.edu/fac_pubs/17943
