Answering visual queries is a complex task that requires both visual
pro...
We introduce a video framework for modeling the association between verb...
Many visual recognition models are evaluated only on their classificatio...
Incidental supervision from language has become a popular approach for
l...
Vision-language models (VLMs) such as CLIP have shown promising performa...
Variational autoencoders (VAEs) suffer from posterior collapse, where th...
3D reconstruction is a fundamental problem in computer vision, and the t...
The primary aim of single-image super-resolution is to construct a
high-...
This work identifies and addresses two important technical challenges in...