ATreeGrep: Approximate searching in unordered trees
Document Type
Conference Proceeding
Publication Date
1-1-2002
Abstract
An unordered labeled tree is a tree in which each node has a string label and the parent-child relationship is significant, but the order among siblings is unimportant. This paper presents an approach to the nearest neighbor search problem for these trees. Given a database D of unordered labeled trees and a query tree Q, the goal is to find those trees in D that "approximately" contain Q. Our approach is based on storing the paths of the trees in a suffix array and then counting the number of mismatching paths between the query tree and a data tree. To speed up a search, we use a hash-based technique to filter out unqualified data trees at an early stage of the search. Experimental results obtained by running our techniques on phylogenetic trees and synthetic data demonstrate the good performance of the proposed approach. We also discuss the use of our work in XML and scientific database management.
Identifier
84948667724 (Scopus)
ISBN
[0769516327]
Publication Title
Proceedings of the International Conference on Scientific and Statistical Database Management Ssdbm
External Full Text Location
https://doi.org/10.1109/SSDM.2002.1029709
ISSN
10993371
First Page
89
Last Page
98
Volume
2002-January
Grant
IIS-9988345
Recommended Citation
Shasha, D.; Wang, J. T.L.; Shan, Huiyuan; and Zhang, Kaizhong, "ATreeGrep: Approximate searching in unordered trees" (2002). Faculty Publications. 14989.
https://digitalcommons.njit.edu/fac_pubs/14989
