A Note on the Linear Convergence of Policy Gradient Methods

07/21/2020
by   Jalaj Bhandari, et al.
0

We revisit the finite time analysis of policy gradient methods in the simplest setting: finite state and action problems with a policy class consisting of all stochastic policies and with exact gradient evaluations. Some recent works have viewed these problems as instances of smooth nonlinear optimization problems, suggesting suggest small stepsizes and showing sublinear convergence rates. This note instead takes a policy iteration perspective and highlights that many versions of policy gradient succeed with extremely large stepsizes and attain a linear rate of convergence.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset