On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods
In this study, we demonstrate that the norm test and inner product/orthogonality test presented in <cit.> are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if ϵ^2=θ^2+ν^2 with specific choices of θ and ν. Here, ϵ controls the relative statistical error of the norm of the gradient while θ and ν control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if θ and ν are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if ϵ^2=θ^2+ν^2. Finally, we present two stochastic optimization problems to illustrate our results.
READ FULL TEXT