Tight Dimension Independent Lower Bound on Optimal Expected Convergence Rate for Diminishing Step Sizes in SGD
We study convergence of Stochastic Gradient Descent (SGD) for strongly convex and smooth objective functions F. We prove a lower bound on the expected convergence rate which holds for any sequence of diminishing stepsizes that are designed based on only global knowledge such as the fact that F is smooth and strongly convex, the component functions are smooth and convex, together with more additional information. Our lower bound meets the expected convergence rate of a recently proposed sequence of stepsizes at ICML 2018, which is based on such knowledge, within a factor 32. This shows that the stepsizes as proposed in the ICML paper are close to optimal. Furthermore, we conclude that in order to be able to construct stepsizes that beat our lower bound, more detailed information about F must be known. Our work significantly improves over the state-of-the-art lower bound which we show is another factor 643· d worse, where d is the dimension. We are the first to prove a lower bound that comes within a small constant -- independent from any other problem specific parameters -- from an optimal solution.
READ FULL TEXT