Unsupervised Imitation Learning

06/19/2018
by   Sebastian Curi, et al.
2

We introduce a novel method to learn a policy from unsupervised demonstrations of a process. Given a model of the system and a set of sequences of outputs, we find a policy that has a comparable performance to the original policy, without requiring access to the inputs of these demonstrations. We do so by first estimating the inputs of the system from observed unsupervised demonstrations. Then, we learn a policy by applying vanilla supervised learning algorithms to the (estimated)input-output pairs. For the input estimation, we present a new adaptive linear estimator (AdaL-IE) that explicitly trades-off variance and bias in the estimation. As we show empirically, AdaL-IE produces estimates with lower error compared to the state-of-the-art input estimation method, (UMV-IE) [Gillijns and De Moor, 2007]. Using AdaL-IE in conjunction with imitation learning enables us to successfully learn control policies that consistently outperform those using UMV-IE.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset