PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm

06/11/2023
by   Wensong Bai, et al.
0

In this paper, we propose the first fully push-forward-based Distributional Reinforcement Learning algorithm, called Push-forward-based Actor-Critic EncourageR (PACER). Specifically, PACER establishes a stochastic utility value policy gradient theorem and simultaneously leverages the push-forward operator in the construction of both the actor and the critic. Moreover, based on maximum mean discrepancies (MMD), a novel sample-based encourager is designed to incentivize exploration. Experimental evaluations on various continuous control benchmarks demonstrate the superiority of our algorithm over the state-of-the-art.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset