Faculty Publications

Performance optimization of Hadoop workflows in public clouds through adaptive task partitioning

Tong Shu, Department of Computer Science
Chase Q. Wu, Department of Computer Science

Document Type

Conference Proceeding

Publication Date

10-2-2017

Abstract

Cloud computing provides a cost-effective computing platform for big data workflows where moldable parallel computing models such as MapReduce are widely applied to meet stringent performance requirements. The granularity of task partitioning in each moldable job has a significant impact on workflow completion time and financial cost. We investigate the properties of moldable jobs and design a big-data workflow mapping model, based on which, we formulate a workflow mapping problem to minimize workflow makespan under a budget constraint in public clouds. We show this problem to be strongly NP-complete and design i) a fully polynomial-time approximation scheme (FPTAS) for a special case with a pipeline-structured workflow executed on virtual machines in a single class, and ii) a heuristic for a generalized problem with an arbitrary directed acyclic graph-structured workflow executed on virtual machines in multiple classes. The performance superiority of the proposed solution is illustrated by extensive simulation-based results in Hadoop/YARN in comparison with existing workflow mapping models and algorithms.

Identifier

85034022406 (Scopus)

ISBN

[9781509053360]

Publication Title

Proceedings IEEE INFOCOM

External Full Text Location

https://doi.org/10.1109/INFOCOM.2017.8057204

ISSN

0743166X

Grant

61472320

Fund Ref

National Science Foundation

Recommended Citation

Shu, Tong and Wu, Chase Q., "Performance optimization of Hadoop workflows in public clouds through adaptive task partitioning" (2017). Faculty Publications. 9266.
https://digitalcommons.njit.edu/fac_pubs/9266

This document is currently not available here.

COinS

DOI

10.1109/INFOCOM.2017.8057204

Faculty Publications

Performance optimization of Hadoop workflows in public clouds through adaptive task partitioning

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

ISSN

Grant

Fund Ref

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

Performance optimization of Hadoop workflows in public clouds through adaptive task partitioning

Authors

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

ISSN

Grant

Fund Ref

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links