The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
The paper concerns convergence and asymptotic statistics for stochastic approximation driven by Markovian noise: θ_n+1= θ_n + α_n + 1 f(θ_n, Φ_n+1) , n≥ 0, in which each θ_n∈^d, {Φ_n } is a Markov chain on a general state space X with stationary distribution π, and f:^d×X→^d. In addition to standard Lipschitz bounds on f, and conditions on the vanishing step-size sequence {α_n}, it is assumed that the associated ODE is globally asymptotically stable with stationary point denoted θ^*, where f̅(θ)=E[f(θ,Φ)] with Φ∼π. Moreover, the ODE@∞ defined with respect to the vector field, f̅_∞(θ):= lim_r→∞ r^-1f̅(rθ) , θ∈^d, is asymptotically stable. The main contributions are summarized as follows: (i) The sequence θ is convergent if Φ is geometrically ergodic, and subject to compatible bounds on f. The remaining results are established under a stronger assumption on the Markov chain: A slightly weaker version of the Donsker-Varadhan Lyapunov drift condition known as (DV3). (ii) A Lyapunov function is constructed for the joint process {θ_n,Φ_n} that implies convergence of {θ_n} in L_4. (iii) A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error z_n:= (θ_n-θ^*)/√(α_n). Moment bounds combined with the CLT imply convergence of the normalized covariance, lim_n →∞ E [ z_n z_n^T ] = Σ_θ, where Σ_θ is the asymptotic covariance appearing in the CLT. (iv) An example is provided where the Markov chain Φ is geometrically ergodic but it does not satisfy (DV3). While the algorithm is convergent, the second moment is unbounded.
READ FULL TEXT