Sample-Efficient Imitation Learning via Generative Adversarial Nets
Recent work in imitation learning articulate their formulation around the GAIL architecture, relying on the adversarial training procedure introduced in GANs. Albeit successful at generating behaviours similar to those demonstrated to the agent, GAIL suffers from a high sample complexity in the number of interactions it has to carry out in the environment in order to achieve satisfactory performance. In this work, we dramatically shrink the amount of interactions with the environment by leveraging an off-policy actor-critic architecture. Additionally, employing deterministic policy gradients allows us to treat the learned reward as a differentiable node in the computational graph, while preserving the model-free nature of our approach. Our experiments span a variety of continuous control tasks.
READ FULL TEXT