Ashish Vaswani

research

∙ 10/25/2021

The Efficiency Misnomer

Model efficiency is a critical aspect of developing and deploying machin...

5 Mostafa Dehghani, et al. ∙

research

∙ 09/22/2021

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

There remain many open questions pertaining to the scaling behaviour of ...

3 Yi Tay, et al. ∙

research

∙ 04/18/2021

Simple and Efficient ways to Improve REALM

Dense retrieval has been shown to be effective for retrieving relevant d...

0 Vidhisha Balachandran, et al. ∙

research

∙ 03/23/2021

Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Self-attention has the promise of improving computer vision systems due ...

0 Ashish Vaswani, et al. ∙

research

∙ 01/27/2021

Bottleneck Transformers for Visual Recognition

We present BoTNet, a conceptually simple yet powerful backbone architect...

83 Aravind Srinivas, et al. ∙

research

∙ 03/12/2020

Efficient Content-Based Sparse Attention with Routing Transformers

Self-attention has recently been adopted for a wide range of sequence mo...

3 Aurko Roy, et al. ∙

research

∙ 06/13/2019

Stand-Alone Self-Attention in Vision Models

Convolutions are a fundamental building block of modern computer vision ...

0 Prajit Ramachandran, et al. ∙

research

∙ 05/29/2019

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

Advances in learning and representations have reinvigorated work that co...

0 Vihan Jain, et al. ∙

research

∙ 04/22/2019

Attention Augmented Convolutional Networks

Convolutional networks have been the paradigm of choice in many computer...

30 Irwan Bello, et al. ∙

research

∙ 11/05/2018

Mesh-TensorFlow: Deep Learning for Supercomputers

Batch-splitting (data-parallelism) is the dominant distributed Deep Neur...

8 Noam Shazeer, et al. ∙

research

∙ 09/12/2018

An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation

Music relies heavily on self-reference to build structure and meaning. W...

2 Cheng-Zhi Anna Huang, et al. ∙

research

∙ 09/12/2018

Music Transformer

Music relies heavily on repetition to build structure and meaning. Self-...

0 Cheng-Zhi Anna Huang, et al. ∙

research

∙ 06/04/2018

Relational inductive biases, deep learning, and graph networks

Artificial intelligence (AI) has undergone a renaissance recently, makin...

0 Peter W. Battaglia, et al. ∙

research

∙ 05/28/2018

Theory and Experiments on Vector Quantized Autoencoders

Deep neural networks with discrete latent variables offer the promise of...

0 Aurko Roy, et al. ∙

research

∙ 03/16/2018

Tensor2Tensor for Neural Machine Translation

Tensor2Tensor is a library for deep learning models that is well-suited ...

0 Ashish Vaswani, et al. ∙

research

∙ 03/09/2018

Fast Decoding in Sequence Models using Discrete Latent Variables

Autoregressive sequence models based on deep neural networks, such as RN...

0 Łukasz Kaiser, et al. ∙

research

∙ 03/06/2018

Self-Attention with Relative Position Representations

Relying entirely on an attention mechanism, the Transformer introduced b...

0 Peter Shaw, et al. ∙

research

∙ 02/15/2018

Image Transformer

Image generation has been successfully cast as an autoregressive sequenc...

0 Niki Parmar, et al. ∙

research

∙ 02/15/2018

Image Tranformer

Image generation has been successfully cast as an autoregressive sequenc...

0 Niki Parmar, et al. ∙

research

∙ 06/16/2017

One Model To Learn Them All

Deep learning yields great results across many fields, from speech recog...

0 Łukasz Kaiser, et al. ∙

research

∙ 06/12/2017

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent...

0 Ashish Vaswani, et al. ∙

Ashish Vaswani

Featured Co-authors

Sign in with Google

Consider DeepAI Pro