SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

Document Type

Conference Proceeding

Publication Date

1-1-2024

Abstract

This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics.In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function.This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF & GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning.However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN).This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems.We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI.The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization.Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN & GPI, aligning with our theoretical findings.

Identifier

85203814855 (Scopus)

Publication Title

Proceedings of Machine Learning Research

e-ISSN

26403498

First Page

58897

Last Page

58934

Volume

235

Grant

FA9550-20-1-0122

Fund Ref

International Business Machines Corporation

This document is currently not available here.

Share

COinS