Existing large language models have to run K times to generate a sequenc...
Sparsely gated Mixture-of-Expert (MoE) has demonstrated its effectivenes...
The accuracy of AlphaFold2, a frontier end-to-end structure prediction
s...
Pre-training models are an important tool in Natural Language Processing...
Accurate protein structure prediction can significantly accelerate the
d...
With the increasing diversity of ML infrastructures nowadays, distribute...
The ever-growing model size and scale of compute have attracted increasi...
Conventional methods for the image-text generation tasks mainly tackle t...
Pre-trained language models have achieved state-of-the-art results in va...
Distributed training has become a pervasive and effective approach for
t...
Deep neural networks (DNNs) exploit many layers and a large number of
pa...
To explore the limit of dialogue generation pre-training, we present the...
Pre-trained models have achieved state-of-the-art results in various Nat...
Deploying deep learning models on mobile devices draws more and more
att...
Federated learning is a new distributed machine learning framework, wher...
Federated learning was proposed with an intriguing vision of achieving
c...