Semantics-aware prediction for analytic qeries in MapReduce environment

Document Type

Conference Proceeding

Publication Date

8-13-2018

Abstract

MapReduce has emerged as a powerful data processing engine that supports large-scale complex analytics applications, most of which are written in declarative query languages such as HiveQL and Pig Latin. Analytic queries are typically compiled into execution plans in the form of directed acyclic graphs (DAGs) of MapReduce jobs. Jobs in the DAGs are dispatched to the MapReduce processing engine as soon as their dependencies are satisfied. MapReduce adopts a job-level scheduling policy to strive for balanced distribution of tasks and effective utilization of resources. However, there is a lack of query-level semantics in the purely task-based scheduling algorithms, resulting in resource thrashing among queries and an overall degradation of performance. Therefore, we introduce a semantic-aware query prediction framework to address these problems systematically. Our framework includes three major techniques: cross-layer semantics percolation, selectivity estimation, and multivariate time prediction for analytic queries. Multivariate query prediction allows us not only to gauge the dynamic size of analytics datasets, but also to accurately predict the resource usage (e.g., numbers of map and reduce tasks) of individual MapReduce jobs and whole queries. In addition, the accurate prediction and queuing of queries can be potentially exploited by Hadoop scheduling for optimizing overall query performance. Based on the query prediction, our case study scheduler demonstrates significant performance improvement compared to traditional Hadoop schedulers.

Identifier

85054831007 (Scopus)

ISBN

[9781450365239]

Publication Title

ACM International Conference Proceeding Series

External Full Text Location

https://doi.org/10.1145/3229710.3229713

This document is currently not available here.

Share

COinS