Detecting duplicate biological entities using markov random field-based edit distance
Document Type
Conference Proceeding
Publication Date
12-1-2008
Abstract
Duplicate entities detection in biological data became a demanded research task [3,5,7,8,9]. In this paper, we propose a novel context-sensitive Markov Random Field-based Edit Distance (MRFED). We apply the Markov Random Field (MRF) theory to Needleman-Wunsch (NW) distance and combine MRFED with TFIDF, a token-based distance algorithm (SoftMRFED). We evaluate SoftMRFED and other distance algorithms (Levenstein, SoftTFIDF, and MongeElkan) at biological entity matching and synonym matching. The experiment results show SoftMRFED significantly outperforms other distance algorithms and its performance is superior to token-based distance algorithms in two matching tasks. © 2008 IEEE.
Identifier
58049138671 (Scopus)
ISBN
[9780769534527]
Publication Title
Proceedings IEEE International Conference on Bioinformatics and Biomedicine Bibm 2008
External Full Text Location
https://doi.org/10.1109/BIBM.2008.34
First Page
457
Last Page
460
Grant
0434581
Fund Ref
National Science Foundation
Recommended Citation
Song, Min and Rudniy, Alex, "Detecting duplicate biological entities using markov random field-based edit distance" (2008). Faculty Publications. 12441.
https://digitalcommons.njit.edu/fac_pubs/12441
