Given two images depicting a person and a garment worn by another person...
We introduce Noise2Music, where a series of diffusion models is trained ...
Current image generation models struggle to reliably produce well-formed...
We present 3DiM, a diffusion model for 3D novel view synthesis, which is...
We present Imagen Video, a text-conditional video generation system base...
We present Imagen, a text-to-image diffusion model with an unprecedented...
Generating temporally coherent high fidelity video is an important miles...
Diffusion models have emerged as an expressive family of generative mode...
We introduce Palette, a simple and general framework for image-to-image
...
We summarize the results of a host of efforts using giant automatic spee...
This paper introduces WaveGrad 2, a non-autoregressive generative model ...
Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a power...
We present SR3, an approach to image Super-Resolution via Repeated
Refin...
We combine recent advancements in end-to-end speech recognition to
non-a...
We present SpeechStew, a speech recognition model that is trained on a
c...
A channel corresponds to a viewpoint or transformation of an underlying
...
This paper introduces WaveGrad, a conditional model for waveform generat...
This paper investigates two latent alignment models for non-autoregressi...
This paper presents the Imputer, a neural sequence model that generates
...
We propose the Insertion-Deletion Transformer, a novel transformer-based...
Recently, SpecAugment, an augmentation scheme for automatic speech
recog...
In this work, we present an empirical study of generation order for mach...
The Insertion Transformer is well suited for long form text generation d...
Speech recognition in cocktail-party environments remains a significant
...
We present KERMIT, a simple insertion-based approach to generative model...
We present SpecAugment, a simple data augmentation method for speech
rec...
Lingvo is a Tensorflow framework offering a complete solution for
collab...
We present the Insertion Transformer, an iterative, partially autoregres...
We present a practical method for protecting data during the inference p...
We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio
...
We present Optimal Completion Distillation (OCD), a training procedure f...
We present a state-of-the-art end-to-end Automatic Speech Recognition (A...
We present the Latent Sequence Decompositions (LSD) framework. LSD decom...
Sequence-to-sequence models have shown success in end-to-end speech
reco...
We present Listen, Attend and Spell (LAS), a neural network that learns ...
Deep Neural Network (DNN) acoustic models have yielded many state-of-the...
We present a novel deep Recurrent Neural Network (RNN) model for acousti...