Massive data corpora like WebText, Wikipedia, Conceptual Captions,
WebIm...
The Visual Question Answering (VQA) task aspires to provide a meaningful...
Communicating with humans is challenging for AIs because it requires a s...
Mirroring the success of masked language models, vision-and-language
cou...
Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undenia...
The ubiquity of embodied gameplay, observed in a wide variety of animal
...
Imagining a scene described in natural language with realistic layout an...