The Potential of the Return Distribution for Exploration in RL

06/11/2018
by   Thomas M. Moerland, et al.
0

This paper studies the potential of the return distribution for exploration in deterministic environments. We study network losses and propagation mechanisms for Gaussian, Categorical and Mixture of Gaussian distributions. Combined with exploration policies that leverage this return distribution, we solve, for example, a randomized Chain task of length 100, which has not been reported before when learning with neural networks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset