Ammar Ahmad Awan

DeepAI

AI Chat AI Image Generator AI Video AI Music Voice Chat AI Photo Editor Math AI

Featured Co-authors

Ji Liu
135 publications
Ce Zhang
133 publications
Cheng Li
122 publications
Zhewei Yao
41 publications
Yuxiong He
34 publications
Hanlin Tang
29 publications
Minjia Zhang
24 publications
Shuaiwen Leon Song
22 publications
Siddharth Singh
19 publications
Xiaoxia Wu
19 publications
Hany Hassan Awadalla
16 publications

research

∙ 08/02/2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

ChatGPT-like models have revolutionized various applications in artifici...

0 Zhewei Yao, et al. ∙

research

∙ 03/15/2023

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

In recent years, the training requirements of many state-of-the-art Deep...

0 Quentin Anthony, et al. ∙

research

∙ 03/11/2023

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

Mixture-of-Experts (MoE) is a neural network architecture that adds spar...

0 Siddharth Singh, et al. ∙

research

∙ 06/30/2022

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

The past several years have witnessed the success of transformer-based m...

6 Reza Yazdani Aminabadi, et al. ∙

research

∙ 01/14/2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

As the training of giant dense models hits the boundary on the availabil...

8 Samyam Rajbhandari, et al. ∙

research

∙ 09/22/2021

Scalable and Efficient MoE Training for Multitask Multilingual Models

The Mixture of Experts (MoE) models are an emerging class of sparsely ac...

0 Young Jin Kim, et al. ∙

research

∙ 04/13/2021

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

To train large models (like BERT and GPT-3) with hundreds or even thousa...

0 Conglong Li, et al. ∙

research

∙ 02/04/2021

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

Scalable training of large models (like BERT and GPT-3) requires careful...

0 Hanlin Tang, et al. ∙

research

∙ 11/12/2019

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow

The enormous amount of data and computation required to train DNNs have ...

0 Ammar Ahmad Awan, et al. ∙

research

∙ 10/25/2018

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

TensorFlow has been the most widely adopted Machine/Deep Learning framew...

0 Ammar Ahmad Awan, et al. ∙

Ammar Ahmad Awan

Featured Co-authors

Sign in with Google

Consider DeepAI Pro