Detecting duplicate biological entities using shortest path edit distance

Document Type

Article

Publication Date

1-1-2010

Abstract

Duplicate entity detection in biological data is an important research task. In this paper, we propose a novel and context-sensitive Shortest Path Edit Distance (SPED) extending and supplementing our previous work on Markov Random Field-based Edit Distance (MRFED). SPED transforms the edit distance computational problem to the calculation of the shortest path among two selected vertices of a graph. We produce several modifications of SPED by applying Levenshtein, arithmetic mean, histogram difference and TFIDF techniques to solve subtasks. We compare SPED performance to other well-known distance algorithms for biological entity matching. The experimental results show that SPED produces competitive outcomes. Copyright © 2010 Inderscience Enterprises Ltd.

Identifier

77954714476 (Scopus)

Publication Title

International Journal of Data Mining and Bioinformatics

External Full Text Location

https://doi.org/10.1504/IJDMB.2010.034196

e-ISSN

17485681

ISSN

17485673

PubMed ID

20815139

First Page

395

Last Page

410

Issue

4

Volume

4

This document is currently not available here.

Share

COinS