Shruti Bhosale

research

∙ 07/18/2023

Llama 2: Open Foundation and Fine-Tuned Chat Models

In this work, we develop and release Llama 2, a collection of pretrained...

0 Hugo Touvron, et al. ∙

research

∙ 05/23/2023

Revisiting Machine Translation for Cross-lingual Classification

Machine Translation (MT) has been widely used for cross-lingual classifi...

0 Mikel Artetxe, et al. ∙

research

∙ 03/10/2023

Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

Mixture-of-Experts (MoE) models have gained popularity in achieving stat...

0 Haiyang Huang, et al. ∙

research

∙ 12/15/2022

Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation

Sparsely gated Mixture of Experts (MoE) models have been shown to be a c...

0 Maha Elbayad, et al. ∙

research

∙ 12/14/2022

Causes and Cures for Interference in Multilingual Translation

Multilingual machine translation models can benefit from synergy between...

0 Uri Shaham, et al. ∙

research

∙ 07/11/2022

No Language Left Behind: Scaling Human-Centered Machine Translation

Driven by the goal of eradicating language barriers on a global scale, m...

9 NLLB team, et al. ∙

research

∙ 05/22/2022

Multilingual Machine Translation with Hyper-Adapters

Multilingual machine translation suffers from negative interference acro...

0 Christos Baziotis, et al. ∙

research

∙ 03/25/2022

Data Selection Curriculum for Neural Machine Translation

Neural Machine Translation (NMT) models are typically trained on heterog...

10 Tasnim Mohiuddin, et al. ∙

research

∙ 12/20/2021

Efficient Large Scale Language Modeling with Mixtures of Experts

Mixture of Experts layers (MoEs) enable efficient scaling of language mo...

10 Mikel Artetxe, et al. ∙

research

∙ 12/20/2021

Few-shot Learning with Multilingual Language Models

Large-scale autoregressive language models such as GPT-3 are few-shot le...

8 Xi Victoria Lin, et al. ∙

research

∙ 10/15/2021

Tricks for Training Sparse Translation Models

Multi-task learning with an unbalanced data distribution skews model lea...

8 Dheeru Dua, et al. ∙

research

∙ 08/06/2021

Facebook AI WMT21 News Translation Task Submission

We describe Facebook's multilingual model submission to the WMT2021 shar...

0 Chau Tran, et al. ∙

research

∙ 03/30/2021

BASE Layers: Simplifying Training of Large, Sparse Models

We introduce a new balanced assignment of experts (BASE) layer for large...

0 Mike Lewis, et al. ∙

research

∙ 11/13/2020

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

Pre-training models on vast quantities of unlabeled data has emerged as ...

0 Shruti Bhosale, et al. ∙

research

∙ 10/21/2020

Beyond English-Centric Multilingual Machine Translation

Existing work in translation demonstrated the potential of massively mul...

11 Angela Fan, et al. ∙

Shruti Bhosale

Featured Co-authors

Sign in with Google

Consider DeepAI Pro