Generative Large Language Models (LLMs) have achieved remarkable advance...
Large Language Models (LLMs) have achieved state-of-the-art performance
...
Generative Pre-trained Transformer (GPT) models have shown remarkable
ca...
Mixture of Experts (MoE) models with conditional execution of sparsely
a...
This paper proposes a simple yet effective method to improve direct (X-t...
Multilingual Neural Machine Translation (MNMT) enables one system to
tra...
Sparsely activated transformers, such as Mixture of Experts (MoE), have
...
This paper describes our submission to the constrained track of WMT21 sh...
This report describes Microsoft's machine translation systems for the WM...
The Mixture of Experts (MoE) models are an emerging class of sparsely
ac...
While pretrained encoders have achieved success in various natural langu...
Multilingual machine translation enables a single model to translate bet...
This paper describes our submission to the WMT20 sentence filtering task...
Transformer-based models are the state-of-the-art for Natural Language
U...
While monolingual data has been shown to be useful in improving bilingua...
In this paper, we explore different neural network architectures that ca...