On Performance Modeling and Prediction for Spark-HBase Applications in Big Data Systems

Document Type

Conference Proceeding

Publication Date

1-1-2022

Abstract

Many large-scale applications in various business and scientific domains require both parallel computing and distributed data management for big data processing. One typical scenario is the use of the Spark computing engine to process a large amount of data managed by HBase in Hadoop. Such computing workflows provide an opportunity to optimize application performance through strategic resource allocation with suitable parameter settings. As such, it necessitates accurate modeling and prediction of application performance to provide an effective recommendation of optimal system configurations to end users. However, this is a challenging problem for multiple reasons, mainly the large parameter space and the dynamic interactions between different technology layers of big data systems. In this paper, we propose a class of regression-based machine learning models to predict the execution performance of Spark-HBase applications in Hadoop. We first explore and identify an exhaustive set of system parameters across multiple layers including Spark and HBase, and then conduct in-depth exploratory analysis of their effects on the execution time of Spark-HBase applications. Based on these analysis results, we design a performance predictor using regression-based machine learning algorithms. Experimental results show that the resulted predictor achieves high accuracy with different algorithms in comparison. The proposed approach can facilitate automatic system configurations and has potential to be applied to other similar systems for big data processing.

Identifier

85137273638 (Scopus)

ISBN

[9781538683477]

Publication Title

IEEE International Conference on Communications

External Full Text Location

https://doi.org/10.1109/ICC45855.2022.9838762

ISSN

15503607

First Page

3685

Last Page

3690

Volume

2022-May

This document is currently not available here.

Share

COinS