Profiling-Based Big Data Workflow Optimization in a Cross-layer Coupled Design Framework
Document Type
Conference Proceeding
Publication Date
1-1-2020
Abstract
Big data processing and analysis increasingly rely on workflow technologies for knowledge discovery and scientific innovation. The execution of big data workflows is now commonly supported on reliable and scalable data storage and computing platforms such as Hadoop. There are a variety of factors affecting workflow performance across multiple layers of big data systems, including the inherent properties (such as scale and topology) of the workflow, the parallel computing engine it runs on, the resource manager that orchestrates distributed resources, the file system that stores data, as well as the parameter setting of each layer. Optimizing workflow performance is challenging because the compound effects of the aforementioned layers are complex and opaque to end users. Generally, tuning their parameters requires an in-depth understanding of big data systems, and the default settings do not always yield optimal performance. We propose a profiling-based cross-layer coupled design framework to determine the best parameter setting for each layer in the entire technology stack to optimize workflow performance. To tackle the large parameter space, we reduce the number of experiments needed for profiling with two approaches: i) identify a subset of critical parameters with the most significant influence through feature selection; and ii) minimize the search process within the value range of each critical parameter using stochastic approximation. Experimental results show that the proposed optimization framework provides the most suitable parameter settings for a given workflow to achieve the best performance. This profiling-based method could be used by end users and service providers to configure and execute large-scale workflows in complex big data systems.
Identifier
85092686840 (Scopus)
ISBN
[9783030602475]
Publication Title
Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics
External Full Text Location
https://doi.org/10.1007/978-3-030-60248-2_14
e-ISSN
16113349
ISSN
03029743
First Page
197
Last Page
217
Volume
12454 LNCS
Grant
CNS-1828123
Fund Ref
National Science Foundation
Recommended Citation
Ye, Qianwen; Wu, Chase Q.; Liu, Wuji; Hou, Aiqin; and Shen, Wei, "Profiling-Based Big Data Workflow Optimization in a Cross-layer Coupled Design Framework" (2020). Faculty Publications. 5744.
https://digitalcommons.njit.edu/fac_pubs/5744
