BPS: Batching, Pipelining, Surgeon of Continuous Deep Inference on Collaborative Edge Intelligence

Document Type

Article

Publication Date

1-1-2024

Abstract

Users on edge generate deep inference requests continuously over time. Mobile/edge devices located near users can undertake the computation of inference locally for users, e.g., the embedded edge device on an autonomous vehicle. Due to limited computing resources on one mobile/edge device, it may be challenging to process the inference requests from users with high throughput. An attractive solution is to (partially) offload the computation to a remote device in the network. In this paper, we examine the existing inference execution solutions across local and remote devices and propose an adaptive scheduler, a BPS scheduler, for continuous deep inference on collaborative edge intelligence. By leveraging data parallel, neurosurgeon, reinforcement learning techniques, BPS can boost the overall inference performance by up to 8.2× over the baseline schedulers. A lightweight compressor, FF, specialized in compressing intermediate output data for neurosurgeon, is proposed and integrated into the BPS scheduler. FF exploits the operating character of convolutional layers and utilizes efficient approximation algorithms. Compared to existing compression methods, FF achieves up to 86.9% lower accuracy loss and up to 83.6% lower latency overhead.

Identifier

85192765162 (Scopus)

Publication Title

IEEE Transactions on Cloud Computing

External Full Text Location

https://doi.org/10.1109/TCC.2024.3399616

e-ISSN

21687161

First Page

830

Last Page

843

Issue

3

Volume

12

Grant

2147623

Fund Ref

National Science Foundation

This document is currently not available here.

Share

COinS