Histogram difference string distance for enhancing ontology integration in bioinformatics

Document Type

Conference Proceeding

Publication Date

12-1-2012

Abstract

Integration of bioinformatics ontologies is an important research task. This paper presents a family of new methods of string distance computation for improving existing ontology integration and alignment techniques. A histogram, the main tool of the introduced methods, is an associative array for storing the number of occurrences of each character in a string. We use histogram difference in combination with Longest Common Prefix, TFIDF, Smith-Waterman, and Jaccard re-scorers to define the four members of our family of string matching methods. We compare the performance of our methods with several well-known string matching algorithms using five Gene Ontology datasets as test beds. Our methods outperformed those algorithms in terms of average precision on four datasets and for maximum F1 measure on three datasets. On the remaining datasets our results were among the best, compared to these well-known methods.

Identifier

84883638335 (Scopus)

ISBN

[9781618397461]

Publication Title

4th International Conference on Bioinformatics and Computational Biology 2012 Bicob 2012

First Page

108

Last Page

113

This document is currently not available here.

Share

COinS