γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

10/27/2020
by   Michael Janner, et al.
15

We introduce the γ-model, a predictive model of environment dynamics with an infinite probabilistic horizon. Replacing standard single-step models with γ-models leads to generalizations of the procedures that form the foundation of model-based control, including the model rollout and model-based value estimation. The γ-model, trained with a generative reinterpretation of temporal difference learning, is a natural continuous analogue of the successor representation and a hybrid between model-free and model-based mechanisms. Like a value function, it contains information about the long-term future; like a standard predictive model, it is independent of task reward. We instantiate the γ-model as both a generative adversarial network and normalizing flow, discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors, and empirically investigate its utility for prediction and control.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro