Faculty Publications

Reinforcement Learning Based Online Request Scheduling Framework for Workload-Adaptive Edge Deep Learning Inference

Xinrui Tan, Institute of Information Engineering
Hongjia Li, Institute of Information Engineering
Xiaofei Xie, Singapore Management University
Lu Guo, TravelSky Technology Limited
Nirwan Ansari, Newark College of Engineering
Xueqing Huang, New York Institute of Technology
Liming Wang, Institute of Information Engineering
Zhen Xu, Institute of Information Engineering
Yang Liu, School of Computer Science and Engineering

Document Type

Article

Publication Date

1-1-2024

Abstract

The recent advances of deep learning in various mobile and Internet-of-Things applications, coupled with the emergence of edge computing, have led to a strong trend of performing deep learning inference on the edge servers located physically close to the end devices. This trend presents the challenge of how to meet the quality-of-service requirements of inference tasks at the resource-constrained network edge, especially under variable or even bursty inference workloads. Solutions to this challenge have not yet been reported in the related literature. In the present paper, we tackle this challenge by means of workload-adaptive inference request scheduling: in different workload states, via adaptive inference request scheduling policies, different models with diverse model sizes can play different roles to maintain high-quality inference services. To implement this idea, we propose a request scheduling framework for general-purpose edge inference serving systems. Theoretically, we prove that, in our framework, the problem of optimizing the inference request scheduling policies can be formulated as a Markov decision process (MDP). To tackle such an MDP, we use reinforcement learning and propose a policy optimization approach. Through extensive experiments, we empirically demonstrate the effectiveness of our framework in the challenging practical case where the MDP is partially observable.

Identifier

85199319750 (Scopus)

Publication Title

IEEE Transactions on Mobile Computing

External Full Text Location

https://doi.org/10.1109/TMC.2024.3429571

e-ISSN

15580660

ISSN

15361233

First Page

13222

Last Page

13239

Issue

Volume

Grant

2019YFB1005200

Fund Ref

China Scholarship Council

Recommended Citation

Tan, Xinrui; Li, Hongjia; Xie, Xiaofei; Guo, Lu; Ansari, Nirwan; Huang, Xueqing; Wang, Liming; Xu, Zhen; and Liu, Yang, "Reinforcement Learning Based Online Request Scheduling Framework for Workload-Adaptive Edge Deep Learning Inference" (2024). Faculty Publications. 945.
https://digitalcommons.njit.edu/fac_pubs/945

This document is currently not available here.

COinS

DOI

10.1109/TMC.2024.3429571

Faculty Publications

Reinforcement Learning Based Online Request Scheduling Framework for Workload-Adaptive Edge Deep Learning Inference

Document Type

Publication Date

Abstract

Identifier

Publication Title

External Full Text Location

e-ISSN

ISSN

First Page

Last Page

Issue

Volume

Grant

Fund Ref

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

Reinforcement Learning Based Online Request Scheduling Framework for Workload-Adaptive Edge Deep Learning Inference

Authors

Document Type

Publication Date

Abstract

Identifier

Publication Title

External Full Text Location

e-ISSN

ISSN

First Page

Last Page

Issue

Volume

Grant

Fund Ref

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links