On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths

01/24/2021
by   Quynh Nguyen, et al.
0

This paper studies the global convergence of gradient descent for deep ReLU networks under the square loss. For this setting, the current state-of-the-art results show that gradient descent converges to a global optimum if the widths of all the hidden layers scale at least as Ω(N^8) (N being the number of training samples). In this paper, we discuss a simple proof framework which allows us to improve the existing over-parameterization condition to linear, quadratic and cubic widths (depending on the type of initialization scheme and/or the depth of the network).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset