Discovering closed and maximal embedded patterns from large tree data
Document Type
Article
Publication Date
5-1-2021
Abstract
Many current applications and systems produce large tree datasets and export, exchange, and represent data in tree-structured form. Extracting informative patterns from large data trees is an important research direction with multiple applications in practice. Pattern mining research initially focused on mining induced patterns and gradually evolved into mining embedded patterns. A well-known problem of frequent pattern mining is the huge number of patterns it produces. This affects not only the efficiency but also the effectiveness of mining. A typical solution to this problem is to summarize frequent patterns through closed and maximal patterns. No previous work addresses the problem of mining closed and/or maximal embedded tree patterns, not even in the framework of mining multiple small trees. We address the problem of summarizing embedded tree patterns extracted from large data trees, by defining and mining closed and maximal embedded unordered tree patterns. We design an embedded frequent pattern mining algorithm extended with a local closedness checking technique. This algorithm is called closedEmbTM-eager as it eagerly eliminates non-closed patterns. To mitigate the generation of intermediate patterns, we devise pattern search space pruning rules to proactively detect and prune branches in the pattern search space which do not correspond to closed patterns. The pruning rules are accommodated into the extended embedded pattern miner to produce a new algorithm, called closedEmbTM-prune, for mining all the closed and maximal embedded frequent patterns. Our extensive experiments on synthetic and real large-tree datasets demonstrate that, on dense datasets, closedEmbTM-prune not only generates a complete closed and maximal pattern set which is substantially smaller than that generated by the embedded pattern miner, but also runs much faster with negligible overhead on pattern pruning.
Identifier
85105693215 (Scopus)
Publication Title
Data and Knowledge Engineering
External Full Text Location
https://doi.org/10.1016/j.datak.2021.101890
ISSN
0169023X
Volume
133
Grant
61872276
Fund Ref
National Natural Science Foundation of China
Recommended Citation
Wu, Xiaoying; Theodoratos, Dimitri; and Mamoulis, Nikos, "Discovering closed and maximal embedded patterns from large tree data" (2021). Faculty Publications. 4139.
https://digitalcommons.njit.edu/fac_pubs/4139