Model-Based Action Exploration

01/11/2018
by   Glen Berseth, et al.
0

Deep reinforcement learning has great stride in solving challenging motion control tasks. Recently there has been a significant amount of work on methods to exploit the data gathered during training, but less work is done on good methods for generating data to learn from. For continuous actions domains, the typical method for generating exploratory actions is by sampling from a Gaussian distribution centred around the mean of a policy. Although these methods can find an optimal policy, in practise, they do not scale well, and solving environments with many actions dimensions becomes impractical. We consider learning a forward dynamics model to predict the result, (s_t+1), of taking a particular action, (a), given a specific observation of the state, (s_t). With a model such as this we, can perform what comes more naturally to biological systems that have already collect experience, we perform internal predictions of outcomes and endeavour to try actions we believe have a reasonable chance of success. This method greatly reduces the space of exploratory actions, increasing learning speed and enables higher quality solutions to difficult problems, such as robotic locomotion.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset