In this work, we develop and release Llama 2, a collection of pretrained...
Machine Translation (MT) has been widely used for cross-lingual
classifi...
Mixture-of-Experts (MoE) models have gained popularity in achieving
stat...
Sparsely gated Mixture of Experts (MoE) models have been shown to be a
c...
Multilingual machine translation models can benefit from synergy between...
Driven by the goal of eradicating language barriers on a global scale,
m...
Multilingual machine translation suffers from negative interference acro...
Neural Machine Translation (NMT) models are typically trained on
heterog...
Mixture of Experts layers (MoEs) enable efficient scaling of language mo...
Large-scale autoregressive language models such as GPT-3 are few-shot
le...
Multi-task learning with an unbalanced data distribution skews model lea...
We describe Facebook's multilingual model submission to the WMT2021 shar...
We introduce a new balanced assignment of experts (BASE) layer for large...
Pre-training models on vast quantities of unlabeled data has emerged as ...
Existing work in translation demonstrated the potential of massively
mul...