Detecting duplicate biological entities using markov random field-based edit distance

Document Type

Conference Proceeding

Publication Date

12-1-2008

Abstract

Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks. © 2008 IEEE.

Identifier

58049138671 (Scopus)

ISBN

[9780769534527]

Publication Title

Proceedings IEEE International Conference on Bioinformatics and Biomedicine Bibm 2008

External Full Text Location

https://doi.org/10.1109/BIBM.2008.34

First Page

457

Last Page

460

Grant

0434581

Fund Ref

National Science Foundation

This document is currently not available here.

Share

COinS