Mohammad Shoeybi

research

∙ 08/15/2023

RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models

In this paper, we investigate the in-context learning ability of retriev...

0 Jie Huang, et al. ∙

research

∙ 04/13/2023

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

Large decoder-only language models (LMs) can be largely improved in term...

16 Boxin Wang, et al. ∙

research

∙ 02/14/2023

Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models

Pretrained large language models have become indispensable for solving v...

0 Shrimai Prabhumoye, et al. ∙

research

∙ 02/09/2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

Augmenting pretrained language models (LMs) with a vision encoder (e.g.,...

0 Zhuolin Yang, et al. ∙

research

∙ 10/25/2022

Evaluating Parameter Efficient Learning for Generation

Parameter efficient learning methods (PERMs) have recently gained signif...

0 Peng Xu, et al. ∙

research

∙ 10/12/2022

Context Generation Improves Open Domain Question Answering

Closed-book question answering (QA) requires a model to directly answer ...

12 Dan Su, et al. ∙

research

∙ 10/06/2022

Prompt Compression and Contrastive Conditioning for Controllability and Toxicity Reduction in Language Models

We explore the idea of compressing the prompts used to condition languag...

0 David Wingate, et al. ∙

research

∙ 09/12/2022

FP8 Formats for Deep Learning

FP8 is a natural progression for accelerating deep learning training inf...

0 Paulius Micikevicius, et al. ∙

research

∙ 06/09/2022

Factuality Enhanced Language Models for Open-Ended Text Generation

Pretrained language models (LMs) are susceptible to generate text with n...

2 Nayeon Lee, et al. ∙

research

∙ 05/10/2022

Reducing Activation Recomputation in Large Transformer Models

Training large transformer models is one of the most important computati...

0 Vijay Korthikanti, et al. ∙

research

∙ 03/16/2022

Multi-Stage Prompting for Knowledgeable Dialogue Generation

Existing knowledge-grounded dialogue systems typically use finetuned ver...

0 Zihan Liu, et al. ∙

research

∙ 02/08/2022

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Pre-trained language models (LMs) are shown to easily generate toxic lan...

16 Boxin Wang, et al. ∙

research

∙ 01/28/2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Pretrained general-purpose language models can achieve state-of-the-art ...

8 Shaden Smith, et al. ∙

research

∙ 12/15/2021

Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases

Detecting social bias in text is challenging due to nuance, subjectivity...

22 Shrimai Prabhumoye, et al. ∙

research

∙ 07/05/2021

Long-Short Transformer: Efficient Transformers for Language and Vision

Transformers have achieved success in both language and vision domains. ...

12 Chen Zhu, et al. ∙

research

∙ 04/09/2021

Efficient Large-Scale Language Model Training on GPU Clusters

Large language models have led to state-of-the-art accuracies across a r...

8 Deepak Narayanan, et al. ∙

research

∙ 01/02/2021

End-to-End Training of Neural Retrievers for Open-Domain Question Answering

Recent work on training neural retrievers for open-domain question answe...

0 Devendra Singh Sachan, et al. ∙

research

∙ 10/20/2020

Local Knowledge Powered Conversational Agents

State-of-the-art conversational agents have advanced significantly in co...

0 Sashank Santhanam, et al. ∙

research

∙ 10/12/2020

BioMegatron: Larger Biomedical Domain Language Model

There has been an influx of biomedical domain-specific language models, ...

11 Hoo-chang Shin, et al. ∙

research

∙ 10/02/2020

MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

Existing pre-trained large language models have shown unparalleled gener...

0 Peng Xu, et al. ∙

research

∙ 05/13/2020

Large Scale Multi-Actor Generative Dialog Modeling

Non-goal oriented dialog agents (i.e. chatbots) aim to produce varying a...

0 Alex Boyd, et al. ∙

research

∙ 03/02/2020

Style Example-Guided Text Generation using Generative Adversarial Transformers

We introduce a language generative model framework for generating a styl...

0 Kuo-Hao Zeng, et al. ∙

research

∙ 02/22/2020

Training Question Answering Models From Synthetic Data

Question and answer generation is a data augmentation method that aims t...

16 Raul Puri, et al. ∙

research

∙ 12/25/2019

Neural ODEs for Image Segmentation with Level Sets

We propose a novel approach for image segmentation that combines Neural ...

6 Rafael Valle, et al. ∙

research

∙ 09/17/2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Recent work in unsupervised language modeling demonstrates that training...

0 Mohammad Shoeybi, et al. ∙

research

∙ 09/17/2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using GPU Model Parallelism

Recent work in unsupervised language modeling demonstrates that training...

0 Mohammad Shoeybi, et al. ∙

research

∙ 06/13/2019

Unsupervised Video Interpolation Using Cycle Consistency

Learning to synthesize high frame rate videos via interpolation requires...

1 Fitsum A. Reda, et al. ∙

research

∙ 10/25/2017

Trace norm regularization and faster inference for embedded speech recognition RNNs

We propose and evaluate new techniques for compressing and speeding up d...

0 Markus Kliegl, et al. ∙

research

∙ 02/25/2017

Deep Voice: Real-time Neural Text-to-Speech

We present Deep Voice, a production-quality text-to-speech system constr...

0 Sercan O. Arik, et al. ∙

Mohammad Shoeybi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro