Text-VQA aims at answering questions that require understanding the text...
Despite the rapid progress in deep visual recognition, modern computer v...
We propose CLIP-Lite, an information efficient method for visual
represe...
Videos are a rich source for self-supervised learning (SSL) of visual
re...
Large-scale vision and language representation learning has shown promis...
Recent advances in self-supervised learning (SSL) have largely closed th...
Recent research in Visual Question Answering (VQA) has revealed
state-of...
Existing VQA datasets contain questions with varying levels of complexit...
An approach to make text visually appealing and memorable is semantic
re...
Many vision and language models suffer from poor visual grounding - ofte...
Individual neurons in convolutional neural networks supervised for
image...
We propose a technique for making Convolutional Neural Network (CNN)-bas...
We propose a technique for producing "visual explanations" for decisions...
We are interested in counting the number of instances of object classes ...