Increasing the Action Gap: New Operators for Reinforcement Learning

12/15/2015
by   Marc G. Bellemare, et al.
0

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2018

A General Family of Robust Stochastic Operators for Reinforcement Learning

We consider a new family of operators for reinforcement learning with th...
research
10/27/2017

On the Optimal Reconstruction of Partially Observed Functional Data

We propose a new reconstruction operator that aims to recover the missin...
research
01/26/2018

Strong-consistent autoregressive predictors in abstract Banach spaces

This work derives new results on the strong-consistency of a componentwi...
research
05/31/2023

Are Neural Operators Really Neural Operators? Frame Theory Meets Operator Learning

Recently, there has been significant interest in operator learning, i.e....
research
08/15/2018

A note on strong-consistency of componentwise ARH(1) predictors

New results on strong-consistency, in the Hilbert-Schmidt and trace oper...
research
01/28/2022

Sampling Theorems for Learning from Incomplete Measurements

In many real-world settings, only incomplete measurement data are availa...
research
10/02/2020

Self-Play Reinforcement Learning for Fast Image Retargeting

In this study, we address image retargeting, which is a task that adjust...

Please sign up or login with your details

Forgot password? Click here to reset