Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part I
In this work, we investigate stochastic quasi-Newton methods for minimizing a finite sum of cost functions over a decentralized network. In Part I, we develop a general algorithmic framework that incorporates stochastic quasi-Newton approximations with variance reduction so as to achieve fast convergence. At each time each node constructs a local, inexact quasi-Newton direction that asymptotically approaches the global, exact one. To be specific, (i) A local gradient approximation is constructed by using dynamic average consensus to track the average of variance-reduced local stochastic gradients over the entire network; (ii) A local Hessian inverse approximation is assumed to be positive definite with bounded eigenvalues, and how to construct it to satisfy these assumptions will be given in Part II. Compared to the existing decentralized stochastic first-order methods, the proposed general framework introduces the second-order curvature information without incurring extra sampling or communication. With a fixed step size, we establish the conditions under which the proposed general framework linearly converges to an exact optimal solution.
READ FULL TEXT