Gradient Descent Happens in a Tiny Subspace

12/12/2018
by   Guy Gur-Ari, et al.
0

We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly in this subspace. We give an example of this effect in a solvable model of classification, and we comment on possible implications for optimization and learning.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset