Generating video stories from text prompts is a complex task. In additio...
Recent work in vision-and-language pretraining has investigated supervis...
While pretraining on large-scale image-text data from the Web has facili...
There has been a recent explosion of computer vision models which perfor...
Most vision-and-language pretraining research focuses on English tasks.
...
Language models are defined over a finite set of inputs, which creates a...
We aim to learn language models for Creole languages for which large vol...
Vision-and-language (V L) models pretrained on large-scale multimodal ...
We consider the task of sequencing tracks on music streaming platforms w...
Various efforts in the Natural Language Processing (NLP) community have ...
Creole languages such as Nigerian Pidgin English and Haitian Creole are
...
Pretrained vision-and-language BERTs aim to learn representations that
c...
Image captioning has focused on generalizing to images drawn from the sa...
Large-scale pretraining and task-specific fine-tuning is now the standar...
The performance of neural machine translation systems is commonly evalua...
Most neural machine translation (NMT) models operate on source and targe...
Several complex tasks that arise in organizations can be simplified by
m...