DTN: A Learning Rate Scheme with Convergence Rate of O(1/t) for SGD
We propose a novel diminishing learning rate scheme, coined Decreasing-Trend-Nature (DTN), which allows us to prove fast convergence of the Stochastic Gradient Descent (SGD) algorithm to a first-order stationary point for smooth general convex and some class of nonconvex including neural network applications for classification problems. We are the first to prove that SGD with diminishing learning rate achieves a convergence rate of O(1/t) for these problems. Our theory applies to neural network applications for classification problems in a straightforward way.
READ FULL TEXT