Generative text-to-image (GTI) models produce high-quality images from s...
Captions that describe or explain charts help improve recall and
compreh...
Saliency methods calculate how important each input feature is to a mach...
In this paper, we explore self-supervised audio-visual models that learn...
Saliency methods – techniques to identify the importance of input featur...
Multimodal self-supervised learning is getting more and more attention a...
Current methods for learning visually grounded language from videos ofte...
Embeddings – mappings from high-dimensional discrete input to
lower-dime...