Convergence and Complexity of Stochastic Subgradient Methods with Dependent Data for Nonconvex Optimization

03/29/2022
by   Ahmet Alacaoglu, et al.
0

We show that under a general dependent data sampling scheme, the classical stochastic projected and proximal subgradient methods for weakly convex functions have worst-case rate of convergence Õ(n^-1/4) and complexity Õ(ε^-4) for achieving an ε-near stationary point in terms of the norm of the gradient of Moreau envelope. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the specific case of constrained smooth nonconvex optimization with dependent data from Õ(ε^-8) to Õ(ε^-4) with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for adaptive stochastic subgradient algorithm AdaGrad and stochastic subgradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes with optimal rate of convergence guarantee.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset