b'Yanping Huang'

research

∙ 05/29/2023

Brainformers: Trading Simplicity for Efficiency

Transformers are central to recent successes in natural language process...

0 Yanqi Zhou, et al. ∙

research

∙ 05/20/2023

Lifelong Language Pretraining with Distribution-Specialized Experts

Pretraining on a large-scale corpus has become a standard method to buil...

0 Wuyang Chen, et al. ∙

research

∙ 02/17/2023

Massively Multilingual Shallow Fusion with Large Language Models

While large language models (LLM) have made impressive progress in natur...

0 Ke Hu, et al. ∙

research

∙ 05/09/2022

Building Machine Translation Systems for the Next Thousand Languages

In this paper we share findings from our effort to build practical machi...

8 Ankur Bapna, et al. ∙

research

∙ 02/18/2022

Mixture-of-Experts with Expert Choice Routing

Sparsely-activated Mixture-of-experts (MoE) models allow the number of p...

0 Yanqi Zhou, et al. ∙

research

∙ 02/17/2022

Designing Effective Sparse Expert Models

Scale has opened new frontiers in natural language processing – but at a...

0 Barret Zoph, et al. ∙

research

∙ 12/13/2021

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Scaling language models with more data, compute and parameters has drive...

4 Nan Du, et al. ∙

research

∙ 09/27/2021

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic spee...

1 Yu Zhang, et al. ∙

research

∙ 05/10/2021

GSPMD: General and Scalable Parallelization for ML Computation Graphs

We present GSPMD, an automatic, compiler-based parallelization system fo...

4 Yuanzhong Xu, et al. ∙

research

∙ 10/14/2020

Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout

The vast majority of deep models use multiple gradient signals, typicall...

0 Zhao Chen, et al. ∙

research

∙ 06/30/2020

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Neural network scaling has been critical for improving the model quality...

7 Dmitry Lepikhin, et al. ∙

research

∙ 02/21/2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collab...

13 Jonathan Shen, et al. ∙

research

∙ 11/16/2018

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

GPipe is a scalable pipeline parallelism library that enables learning o...

0 Yanping Huang, et al. ∙

research

∙ 02/05/2018

Regularized Evolution for Image Classifier Architecture Search

The effort devoted to hand-crafting image classifiers has motivated the ...

0 Esteban Real, et al. ∙

research

∙ 08/28/2015

Learning Efficient Representations for Reinforcement Learning

Markov decision processes (MDPs) are a well studied framework for solvin...

0 Yanping Huang, et al. ∙

research

∙ 08/28/2015

Partitioning Large Scale Deep Belief Networks Using Dropout

Deep learning methods have shown great promise in many practical applica...

0 Yanping Huang, et al. ∙

Yanping Huang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro