Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems
Document Type
Conference Proceeding
Publication Date
4-8-2019
Abstract
The explosive growth of data in various scientific, industrial, and business domains necessitates the use of big data processing systems, such as Hadoop, which are typically deployed in a physical or cloud-based cluster shared by many users running parallel jobs. As the user population and application scale increase, such systems are expanded from time to time with an addition of new nodes of different types, making the cluster highly heterogeneous. Job scheduling in such systems largely determines the performance of big data applications and remains to be a challenging problem. In this paper, we formulate a generic job scheduling problem for parallel processing of big data in heterogeneous clusters and design a k-means based task scheduling algorithm, referred to as KMTS. Simulation results show that KMTS improves execution performance by 25% and 30% on average in single job scheduling and parallel job scheduling, respectively, over existing methods. The performance superiority is also confirmed by real experiments in high-performance computing environments.
Identifier
85064975553 (Scopus)
ISBN
[9781538692233]
Publication Title
2019 International Conference on Computing Networking and Communications Icnc 2019
External Full Text Location
https://doi.org/10.1109/ICCNC.2019.8685520
First Page
22
Last Page
28
Grant
2018GY-011
Fund Ref
Northwest University
Recommended Citation
Xu, Mingrui; Wu, Chase Q.; Hou, Aiqin; and Wang, Yongqiang, "Intelligent Scheduling for Parallel Jobs in Big Data Processing Systems" (2019). Faculty Publications. 7664.
https://digitalcommons.njit.edu/fac_pubs/7664
