Mining frequent agreement subtrees in phylogenetic databases
Document Type
Conference Proceeding
Publication Date
1-1-2006
Abstract
We present a new data mining problem to discover frequent agreement subtree patterns from a database of rooted phylogenetic trees. This problem is a natural extension of the traditional MAST (maximum agreement subtree) problem. To solve the problem, we first present a novel canonical form for leaf-labeled trees and an efficient tree expansion algorithm for generating candidate subtrees level by level. We then show how to efficiently discover all frequent agreement subtrees from a given set of phylogenetic trees, through an Apriori-like data mining approach. We discuss the correctness and completeness of the proposed method. Experimental results demonstrate that the proposed method can discover interesting patterns from different phylogenetic trees for multiple species. The algorithms were implemented in C++ and integrated into an online toolkit, which is fully operational and accessible on the World Wide Web.
Identifier
33745457798 (Scopus)
ISBN
[089871611X, 9780898716115]
Publication Title
Proceedings of the Sixth SIAM International Conference on Data Mining
External Full Text Location
https://doi.org/10.1137/1.9781611972764.20
First Page
222
Last Page
233
Volume
2006
Recommended Citation
Zhang, Sen and Wang, Jason T.L., "Mining frequent agreement subtrees in phylogenetic databases" (2006). Faculty Publications. 19216.
https://digitalcommons.njit.edu/fac_pubs/19216
