Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

04/14/2022

∙

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is Hölder continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

READ FULL TEXT

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Sign in with Google

Consider DeepAI Pro