The ability to extrapolate from short problem instances to longer ones i...
Language models have achieved remarkable performance on a wide range of ...
Large language models have been shown to achieve remarkable performance
...
Large pre-trained language models perform remarkably well on tasks that ...
Empirical studies demonstrate that the performance of neural networks
im...
We study the role of L_2 regularization in deep learning, and uncover
si...
We consider an existing conjecture addressing the asymptotic behavior of...
The choice of initial learning rate can have a profound effect on the
pe...
Transferability of learned features between tasks can massively reduce t...
Understanding the asymptotic behavior of wide networks is of considerabl...
We show that in a variety of large-scale deep learning scenarios the gra...