Generalizing to unseen image domains is a challenging problem primarily ...
The ability to understand visual concepts and replicate and compose thes...
We investigate knowledge retrieval with multi-modal queries, i.e. querie...
In this work, we present a data poisoning attack that confounds machine
...
Videos often capture objects, their visible properties, their motion, an...
Data modification, either via additional training datasets, data
augment...
Information retrieval (IR) is essential in search engines and dialogue
s...
Recent work on unsupervised question answering has shown that models can...
Methodologies for training VQA models assume the availability of dataset...
While progress has been made on the visual question answering leaderboar...
We address the new problem of complex scene completion from sparse label...
Captioning is a crucial and challenging task for video understanding. In...
Logical connectives and their implications on the meaning of a natural
l...
The process of identifying changes or transformations in a scene along w...