Transformers are central to recent successes in natural language process...
Pretraining on a large-scale corpus has become a standard method to buil...
While large language models (LLM) have made impressive progress in natur...
In this paper we share findings from our effort to build practical machi...
Sparsely-activated Mixture-of-experts (MoE) models allow the number of
p...
Scale has opened new frontiers in natural language processing – but at a...
Scaling language models with more data, compute and parameters has drive...
We summarize the results of a host of efforts using giant automatic spee...
We present GSPMD, an automatic, compiler-based parallelization system fo...
The vast majority of deep models use multiple gradient signals, typicall...
Neural network scaling has been critical for improving the model quality...
Lingvo is a Tensorflow framework offering a complete solution for
collab...
GPipe is a scalable pipeline parallelism library that enables learning o...
The effort devoted to hand-crafting image classifiers has motivated the ...
Markov decision processes (MDPs) are a well studied framework for solvin...
Deep learning methods have shown great promise in many practical
applica...