Faculty Publications

Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL

Arnob Ghosh, New Jersey Institute of Technology
Xingyu Zhou, Wayne State University
Ness Shroff, The Ohio State University

Document Type

Conference Proceeding

Publication Date

1-1-2024

Abstract

We study the constrained Markov decision processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. Existing approaches have primarily focused on soft constraint violation, which allows compensation across episodes, making it easier to satisfy the constraints. In contrast, we consider a stronger hard constraint violation metric, where only positive constraint violations are accumulated. Our main result is the development of the first model-free, simulator-free algorithm that achieves a sub-linear regret and a sub-linear hard constraint violation simultaneously, even in large-scale systems. In particular, we show that Õ(pd3H4K) regret and Õ(pd3H4K) hard constraint violation bounds can be achieved, where K is the number of episodes, d is the dimension of the feature mapping, H is the length of the episode. Our results are achieved via novel adaptations of the primal-dual LSVI-UCB algorithm, i.e., it searches for the dual variable that balances between regret and constraint violation within every episode, rather than updating it at the end of each episode. This turns out to be crucial for our theoretical guarantees when dealing with hard constraint violations.

Identifier

85194197257 (Scopus)

Publication Title

Proceedings of Machine Learning Research

e-ISSN

26403498

First Page

1054

Last Page

1062

Volume

238

Grant

CNS-2112471

Fund Ref

Ohio State University

Recommended Citation

Ghosh, Arnob; Zhou, Xingyu; and Shroff, Ness, "Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL" (2024). Faculty Publications. 988.
https://digitalcommons.njit.edu/fac_pubs/988

This document is currently not available here.

COinS

Faculty Publications

Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL

Document Type

Publication Date

Abstract

Identifier

Publication Title

e-ISSN

First Page

Last Page

Volume

Grant

Fund Ref

Recommended Citation

Search

Browse

Author Corner

Links

Faculty Publications

Towards Achieving Sub-linear Regret and Hard Constraint Violation in Model-free RL

Authors

Document Type

Publication Date

Abstract

Identifier

Publication Title

e-ISSN

First Page

Last Page

Volume

Grant

Fund Ref

Recommended Citation

Share

Search

Browse

Author Corner

Links