BPS: Batching, Pipelining, Surgeon of Continuous Deep Inference on Collaborative Edge Intelligence
Document Type
Article
Publication Date
1-1-2024
Abstract
Users on edge generate deep inference requests continuously over time. Mobile/edge devices located near users can undertake the computation of inference locally for users, e.g., the embedded edge device on an autonomous vehicle. Due to limited computing resources on one mobile/edge device, it may be challenging to process the inference requests from users with high throughput. An attractive solution is to (partially) offload the computation to a remote device in the network. In this paper, we examine the existing inference execution solutions across local and remote devices and propose an adaptive scheduler, a BPS scheduler, for continuous deep inference on collaborative edge intelligence. By leveraging data parallel, neurosurgeon, reinforcement learning techniques, BPS can boost the overall inference performance by up to 8.2× over the baseline schedulers. A lightweight compressor, FF, specialized in compressing intermediate output data for neurosurgeon, is proposed and integrated into the BPS scheduler. FF exploits the operating character of convolutional layers and utilizes efficient approximation algorithms. Compared to existing compression methods, FF achieves up to 86.9% lower accuracy loss and up to 83.6% lower latency overhead.
Identifier
85192765162 (Scopus)
Publication Title
IEEE Transactions on Cloud Computing
External Full Text Location
https://doi.org/10.1109/TCC.2024.3399616
e-ISSN
21687161
First Page
830
Last Page
843
Issue
3
Volume
12
Grant
2147623
Fund Ref
National Science Foundation
Recommended Citation
Hou, Xueyu; Guan, Yongjie; Choi, Nakjung; and Han, Tao, "BPS: Batching, Pipelining, Surgeon of Continuous Deep Inference on Collaborative Edge Intelligence" (2024). Faculty Publications. 1025.
https://digitalcommons.njit.edu/fac_pubs/1025