Diffusion models (DMs) can generate realistic images with text guidance ...
Long-form video understanding requires designing approaches that are abl...
Cross-modal recipe retrieval has recently gained substantial attention d...
Image captioning is an interdisciplinary research problem that stands be...
This paper presents a novel framework for visual object recognition usin...
In many computer vision tasks, the relevant information to solve the pro...
This paper introduces self-taught object localization, a novel approach ...
This paper presents a general vector-valued reproducing kernel Hilbert s...
We discuss an attentional model for simultaneous object tracking and
rec...