Large pre-trained models have proved to be remarkable zero- and
(prompt-...
The ability to integrate context, including perceptual and temporal cues...
Music, an integral part of our lives, which is not only a source of
ente...
Musical preferences have been considered a mirror of the self. In this a...
While Visual Question Answering (VQA) models continue to push the
state-...
How can we understand classification decisions made by deep neural nets?...
A counterfactual query is typically of the form 'For situation X, why wa...
Problems at the intersection of vision and language are of significant
i...
Deep neural networks have shown striking progress and obtained
state-of-...
We present an approach to simultaneously perform semantic segmentation a...
The complex compositional structure of language makes problems at the
in...
We are witnessing a proliferation of massive visual data. Unfortunately
...