Energy-efficient mapping of large-scale workflows under deadline constraints in big data computing systems
Document Type
Article
Publication Date
9-1-2020
Abstract
Large-scale workflows for big data analytics have become a main consumer of energy in data centers where moldable parallel computing models such as MapReduce are widely applied to meet high computational demands with time-varying computing resources. The granularity of task partitioning in each moldable job of such big data workflows has a significant impact on energy efficiency, which remains largely unexplored. In this paper, we analyze the properties of moldable jobs and formulate a workflow mapping problem to minimize the dynamic energy consumption of a given workflow request under a deadline constraint in big data systems. Since this problem is strongly NP-hard, we design a fully polynomial-time approximation scheme (FPTAS) for a special case with a pipeline-structured workflow on a homogeneous cluster and a heuristic for the generalized problem with an arbitrary workflow on a heterogeneous cluster. The performance superiority of the proposed solution in terms of dynamic energy saving and deadline missing rate is illustrated by extensive simulation results in comparison with existing algorithms, and further validated by real-life workflow implementation and experimental results in Hadoop/YARN systems.
Identifier
85044507990 (Scopus)
Publication Title
Future Generation Computer Systems
External Full Text Location
https://doi.org/10.1016/j.future.2017.07.050
ISSN
0167739X
First Page
515
Last Page
530
Volume
110
Grant
CNS-1560698
Fund Ref
National Science Foundation
Recommended Citation
Shu, Tong and Wu, Chase Q., "Energy-efficient mapping of large-scale workflows under deadline constraints in big data computing systems" (2020). Faculty Publications. 5032.
https://digitalcommons.njit.edu/fac_pubs/5032
