Conditional Importance Sampling for Off-Policy Learning

10/16/2019
by   Mark Rowland, et al.
12

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset