We work to create a multilingual speech synthesis system which can gener...
Human pose transfer aims to synthesize a new view of a person under a gi...
Despite recent advances in generative modeling for text-to-speech synthe...
Speech-to-text alignment is a critical component of neural textto-speech...
Unsupervised landmark learning is the task of learning semantic keypoint...
Prediction and interpolation for long-range video data involves the comp...
Learning to synthesize high frame rate videos via interpolation requires...
Most scene graph generators use a two-stage pipeline to detect visual
re...
Semantic segmentation requires large amounts of pixel-wise annotations t...
In this paper, we present a simple yet effective padding scheme that can...
Most existing work that grounds natural language phrases in images start...
We present an approach for high-resolution video frame prediction by
con...
Existing deep learning based image inpainting methods use a standard
con...
In this paper, we study the problem of mapping natural language instruct...
We present a simple deep learning framework to simultaneously predict
ke...