Efficiently discovering most-specific mixed patterns from large data trees

Document Type

Conference Proceeding

Publication Date

1-1-2017

Abstract

Discovering informative tree patterns hidden in large datasets is an important research area that has many practical applications. Along the years, research has evolved from mining induced patterns to mining embedded patterns. Mixed patterns allow extracting all the information extracted by embedded or induced patterns but also more detailed information which cannot be extracted by the other two. Unfortunately, the problem of extracting unconstrained mixed patterns from data trees has not been addressed up to now. In this paper, we address the problem of mining unordered frequent mixed patterns from large trees. We propose a novel approach that nonredundantly extracts most-specific mixed patterns. Our approach utilizes effective pruning techniques to reduce the pattern search space. It exploits efficient homomorphic pattern matching algorithms to compute pattern support incrementally and avoids the costly enumeration of all pattern matchings required by older approaches. An extensive experimental evaluation shows that our approach not only mines mixed patterns from real and synthetic datasets up to several orders of magnitude faster than older state-of-the-art embedded tree mining algorithms applied to large data trees but also scales well empowering the extraction of informative mixed patterns from large datasets for which no previous approaches exist.

Identifier

85032275188 (Scopus)

Publication Title

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

External Full Text Location

https://doi.org/10.1007/978-3-319-55753-3_18

e-ISSN

16113349

ISSN

03029743

First Page

279

Last Page

294

Volume

10177 LNCS

Grant

61202035

Fund Ref

National Natural Science Foundation of China

This document is currently not available here.

Share

COinS