Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

04/14/2022
by   Rama Cont, et al.
0

We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is Hölder continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset