Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning
In this paper we present a `stability theorem' for stochastic approximation (SA) algorithms with `controlled Markov' noise. Such algorithms were first studied by Borkar in 2006. Specifically, sufficient conditions are presented which guarantee the stability of the iterates. Further, under these conditions the iterates are shown to track a solution to the differential inclusion defined in terms of the ergodic occupation measures associated with the `controlled Markov' process. As an application to our main result we present an improvement to a general form of temporal difference learning algorithms. Specifically, we present sufficient conditions for their stability and convergence using our framework. This paper builds on the works of Borkar as well as Benveniste, Metivier and Priouret.
READ FULL TEXT