Completion Time Minimization for Data Collection in a UAV-enabled IoT Network: A Deep Reinforcement Learning Approach

Document Type

Article

Publication Date

11-1-2023

Abstract

In this article, we study the completion time minimization in an unmanned aerial vehicle (UAV)-enabled Internet of Things (IoT) network, where the UAV tries to collect all the data generated by the ground IoT devices for further processing. To simplify the analysis, the continuous time horizon is discretized into several time slots. The duration of each time slot is set to be less than the pre-defined threshold such that the UAV's location can be considered as unchanged during each time slot. In our work, we aim to minimize the completion time of the UAV by optimizing the association scheme of the IoT devices, the location (i.e., the trajectory) and velocity of the UAV at each time slot. However, the formulated problem is challenging to solve by traditional optimization methods considering the unknown number of time slots (which leads to the unknown number of decision variables) and non-convex functions. We thus reformulate it as a Markov decision process (MDP) and propose a deep deterministic policy gradient (DDPG)-based method to efficiently solve it. The DDPG-based algorithm uses deep function approximators instead of finding the action that maximizes the state-action value, and is therefore well suited to solve high-dimensional, continuous control problems. Extensive numerical results are presented to validate the effectiveness of our proposed algorithm.

Identifier

85161051996 (Scopus)

Publication Title

IEEE Transactions on Vehicular Technology

External Full Text Location

https://doi.org/10.1109/TVT.2023.3280848

e-ISSN

19399359

ISSN

00189545

First Page

14734

Last Page

14742

Issue

11

Volume

72

Grant

CNS-1814748

Fund Ref

National Science Foundation

This document is currently not available here.

Share

COinS