Optimal Finite-Sum Smooth Non-Convex Optimization with SARAH
The total complexity (measured as the total number of gradient computations) of a stochastic first-order optimization algorithm that finds a first-order stationary point of a finite-sum smooth nonconvex objective function F(w)=1/n∑_i=1^n f_i(w) has been proven to be at least Ω(√(n)/ϵ) where ϵ denotes the attained accuracy E[ ∇ F(w̃)^2] ≤ϵ for the outputted approximation w̃ (Fang et al.,2018). This paper is the first to show that this lower bound is tight for the class of variance reduction methods which only assume the Lipschitz continuous gradient assumption. We prove this complexity result for a slightly modified version of the SARAH algorithm in (Nguyen et al.,2017a;b) - showing that SARAH is optimal and dominates all existing results. For convex optimization, we propose SARAH++ with sublinear convergence for general convex and linear convergence for strongly convex problems; and we provide a practical version for which numerical experiments on various datasets show an improved performance.
READ FULL TEXT