LAS: Logical-Block Affinity Scheduling in Big Data Analytics Systems
Document Type
Conference Proceeding
Publication Date
10-8-2018
Abstract
Parallel computing combined with distributed data storage and management has been widely adopted by most big data analytics systems. Scheduling computing tasks to improve data locality is crucial to the performance of such systems. While existing schedulers target near-data scheduling on top of physical data blocks, these systems face a new scheduling problem where computing tasks process table-based datasets directly and access large physical blocks indirectly through their indices stored in associated small logical blocks. This new problem invalidates the basic assumption made by many existing algorithms on near-data scheduling. In this paper, we propose a Logical-block Affinity Scheduling (LAS) algorithm to coordinate the near-data scheduling of computing tasks and the placement of logical blocks for a desired balance between data-locality and load-balancing to maximize system throughput. The proposed algorithm is implemented and evaluated using a well-known big data benchmark and a practical production system deployed in public clouds. Extensive experimental results illustrate the performance superiority of LAS over three existing scheduling algorithms.
Identifier
85056161078 (Scopus)
ISBN
[9781538641286]
Publication Title
Proceedings IEEE INFOCOM
External Full Text Location
https://doi.org/10.1109/INFOCOM.2018.8486297
ISSN
0743166X
First Page
522
Last Page
530
Volume
2018-April
Grant
61202040
Fund Ref
National Natural Science Foundation of China
Recommended Citation
Bao, Liang; Wu, Chase Q.; Qi, Haiyang; Chen, Weizhao; Zhang, Xin; Han, Weina; Wei, Wei; Tail, En; Wang, Hao; Zhai, Jiahao; and Chen, Xiang, "LAS: Logical-Block Affinity Scheduling in Big Data Analytics Systems" (2018). Faculty Publications. 8339.
https://digitalcommons.njit.edu/fac_pubs/8339
