Detecting duplicate biological entities using shortest path edit distance
Document Type
Article
Publication Date
1-1-2010
Abstract
Duplicate entity detection in biological data is an important research task. In this paper, we propose a novel and context-sensitive Shortest Path Edit Distance (SPED) extending and supplementing our previous work on Markov Random Field-based Edit Distance (MRFED). SPED transforms the edit distance computational problem to the calculation of the shortest path among two selected vertices of a graph. We produce several modifications of SPED by applying Levenshtein, arithmetic mean, histogram difference and TFIDF techniques to solve subtasks. We compare SPED performance to other well-known distance algorithms for biological entity matching. The experimental results show that SPED produces competitive outcomes. Copyright © 2010 Inderscience Enterprises Ltd.
Identifier
77954714476 (Scopus)
Publication Title
International Journal of Data Mining and Bioinformatics
External Full Text Location
https://doi.org/10.1504/IJDMB.2010.034196
e-ISSN
17485681
ISSN
17485673
PubMed ID
20815139
First Page
395
Last Page
410
Issue
4
Volume
4
Recommended Citation
    Rudniy, Alex; Song, Min; and Geller, James, "Detecting duplicate biological entities using shortest path edit distance" (2010). Faculty Publications.  6458.
    
    
    
        https://digitalcommons.njit.edu/fac_pubs/6458
    
 
				 
					