An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees
Document Type
Article
Publication Date
12-1-1998
Abstract
Ordered, labeled trees are trees in which each node has a label and the left-to-right order of its children (if it has any) is fixed. Such trees have many applications in vision, pattern recognition, molecular biology and natural language processing. We consider a substructure of an ordered labeled tree Tto be a connected subgraph of T. Given two ordered labeled trees 7 and T2 and an integer d, the largest approximately common substructure problem is to find a substructure U1 of 7 and a substructure U2 of T2 such that U1 is within edit distance dof U2 and where there does not exist any other substructure l of 7 and V2 of T2 such that l and V2 satisfy the distance constraint and the sum of the sizes of V-, and V2 is greater than the sum of the sizes of U1 and U2. We present a dynamic programming algorithm to solve this problem, which runs as fast as the fastest known algorithm for computing the edit distance of two trees when the distance allowed in the common substructures is a constant independent of the input trees. To demonstrate the utility of our algorithm, we discuss its application to discovering motifs in multiple RNA secondary structures (which are ordered labeled trees). © 1998 IEEE.
Identifier
0032136849 (Scopus)
Publication Title
IEEE Transactions on Pattern Analysis and Machine Intelligence
External Full Text Location
https://doi.org/10.1109/34.709622
ISSN
01628828
First Page
889
Last Page
895
Issue
8
Volume
20
Grant
IRI-9224601
Fund Ref
National Science Foundation
Recommended Citation
Wang, Jason T.L.; Shapiro, Bruce A.; Shasha, Dennis; Zhang, Kaizhong; and Currey, Kathleen M., "An Algorithm for Finding the Largest Approximately Common Substructures of Two Trees" (1998). Faculty Publications. 16292.
https://digitalcommons.njit.edu/fac_pubs/16292
