Homomorphic Pattern Mining from a Single Large Data Tree
Document Type
Article
Publication Date
12-1-2016
Abstract
Finding interesting tree patterns hidden in large datasets is a central topic in data mining with many practical applications. Unfortunately, previous contributions have focused almost exclusively on mining-induced patterns from a set of small trees. The problem of mining homomorphic patterns from a large data tree has been neglected. This is mainly due to the challenging unbounded redundancy that homomorphic tree patterns can display. However, mining homomorphic patterns allows for discovering large patterns which cannot be extracted when mining induced or embedded patterns. Large patterns better characterize big trees which are important for many modern applications in particular with the explosion of big data. In this paper, we address the problem of mining frequent homomorphic tree patterns from a single large tree. We propose a novel approach that extracts non-redundant maximal homomorphic patterns. Our approach employs an incremental frequency computation method that avoids the costly enumeration of all pattern matchings required by previous approaches. Matching information of already computed patterns is materialized as bitmaps, a technique that not only minimizes the memory consumption, but also the CPU time. Our contribution also includes an optimization technique which can further reduce the search space of homomorphic patterns. We conducted detailed experiments to test the performance and scalability of our approach. The experimental evaluation shows that our approach mines larger patterns and extracts maximal homomorphic patterns from real and synthetic datasets outperforming state-of-the-art embedded tree mining algorithms applied to a large data tree.
Identifier
85057236911 (Scopus)
Publication Title
Data Science and Engineering
External Full Text Location
https://doi.org/10.1007/s41019-016-0028-7
e-ISSN
23641541
ISSN
23641185
First Page
203
Last Page
218
Issue
4
Volume
1
Grant
61202035
Fund Ref
National Natural Science Foundation of China
Recommended Citation
Wu, Xiaoying and Theodoratos, Dimitri, "Homomorphic Pattern Mining from a Single Large Data Tree" (2016). Faculty Publications. 10123.
https://digitalcommons.njit.edu/fac_pubs/10123
