Detecting duplicate biological entities using Markov random field-based edit distance
Document Type
Article
Publication Date
1-1-2010
Abstract
Detecting duplicate entities in biological data is an important research task. In this paper, we propose a novel and context-sensitive Markov random field-based edit distance (MRFED) for this task. We apply the Markov random field theory to the Needleman-Wunsch distance and combine MRFED with TFIDF, a token-based distance algorithm, resulting in SoftMRFED. We compare SoftMRFED with other distance algorithms such as Levenshtein, SoftTFIDF, and Monge-Elkan for two matching tasks: biological entity matching and synonym matching. The experimental results show that SoftMRFED significantly outperforms the other edit distance algorithms on several test data collections. In addition, the performance of SoftMRFED is superior to token-based distance algorithms in two matching tasks. © 2009 Springer-Verlag London Limited.
Identifier
78049440735 (Scopus)
Publication Title
Knowledge and Information Systems
External Full Text Location
https://doi.org/10.1007/s10115-009-0254-7
e-ISSN
02193116
ISSN
02191377
First Page
371
Last Page
387
Issue
2
Volume
25
Grant
0434581
Fund Ref
National Science Foundation
Recommended Citation
Song, Min and Rudniy, Alex, "Detecting duplicate biological entities using Markov random field-based edit distance" (2010). Faculty Publications. 6475.
https://digitalcommons.njit.edu/fac_pubs/6475