We present unit scaling, a paradigm for designing deep learning models t...
We present the award-winning submission to the WikiKG90Mv2 track of
OGB-...
Given the current trend of increasing size and complexity of machine lea...
Identifying algorithms for computational efficient unsupervised training...
Attention based language models have become a critical component in
stat...
We investigate the reasons for the performance degradation incurred with...
Much recent research has been dedicated to improving the efficiency of
t...
Deep learning models trained on large data sets have been widely success...
Stochastic Gradient Descent (SGD) has proven to be remarkably effective ...
Modern deep neural network training is typically based on mini-batch
sto...