Learning locomotion skills via model-based proximal meta-reinforcement learning
Document Type
Conference Proceeding
Publication Date
10-1-2019
Abstract
Model-based reinforcement learning methods provide a promising direction for a range of automated applications, such as autonomous vehicles and legged robots, due to their sample-efficiency. However, their asymptotic performance is usually inferior compared to the state-of-the-art model-free reinforcement learning methods in locomotion control domains. One main challenge of model-based reinforcement learning is learning a dynamics model that is accurate enough for planning. This paper mitigates this issue by meta-reinforcement learning from an ensemble of dynamics models. A policy learns from dynamics models that hold different beliefs of a real environment. This procedure improves its adaptability and inaccuracy-tolerance ability. A proximal meta-reinforcement learning algorithm is introduced to improve computational efficiency and reduces variance of higher-order gradient estimation. A heteroscedastic noise is added to the training dataset, thus leading to a robust and efficient model learning. Subsequently, proximal meta-reinforcement learning maximizes the expected returns by sampling 'imaginary' trajectories from the learned dynamics, which does not require real environment data and can be deployed on many servers in parallel to speed up the whole learning process. The aim of this work is to reduce the sample-complexity and computational cost of reinforcement learning in robot locomotion tasks. Simulation experiments show that the proposed algorithm achieves an asymptotic performance compared with the state-of-the-art model-free reinforcement learning methods with significantly fewer samples, which confirm our theoretical results.
Identifier
85076739399 (Scopus)
ISBN
[9781728145693]
Publication Title
Conference Proceedings IEEE International Conference on Systems Man and Cybernetics
External Full Text Location
https://doi.org/10.1109/SMC.2019.8914406
ISSN
1062922X
First Page
1545
Last Page
1550
Volume
2019-October
Grant
2018YFB1304600
Fund Ref
National Natural Science Foundation of China
Recommended Citation
Xiao, Qing; Cao, Zhengcai; and Zhou, Mengchu, "Learning locomotion skills via model-based proximal meta-reinforcement learning" (2019). Faculty Publications. 7290.
https://digitalcommons.njit.edu/fac_pubs/7290
