The success of SGD in deep learning has been ascribed by prior works to ...
Recent works have demonstrated that neural networks exhibit extreme
simp...
Recent papers have shown that sufficiently overparameterized neural netw...
We analyze the inductive bias of gradient descent for weight normalized
...