Mining frequent agreement subtrees in phylogenetic databases

Document Type

Conference Proceeding

Publication Date

1-1-2006

Abstract

We present a new data mining problem to discover frequent agreement subtree patterns from a database of rooted phylogenetic trees. This problem is a natural extension of the traditional MAST (maximum agreement subtree) problem. To solve the problem, we first present a novel canonical form for leaf-labeled trees and an efficient tree expansion algorithm for generating candidate subtrees level by level. We then show how to efficiently discover all frequent agreement subtrees from a given set of phylogenetic trees, through an Apriori-like data mining approach. We discuss the correctness and completeness of the proposed method. Experimental results demonstrate that the proposed method can discover interesting patterns from different phylogenetic trees for multiple species. The algorithms were implemented in C++ and integrated into an online toolkit, which is fully operational and accessible on the World Wide Web.

Identifier

33745457798 (Scopus)

ISBN

[089871611X, 9780898716115]

Publication Title

Proceedings of the Sixth SIAM International Conference on Data Mining

External Full Text Location

https://doi.org/10.1137/1.9781611972764.20

First Page

222

Last Page

233

Volume

2006

This document is currently not available here.

Share

COinS