Learning visual representations from natural language supervision has
re...
Multi-camera tracking systems are gaining popularity in applications tha...
Automated visual understanding of our diverse and open world demands com...
There is a surge of interest in image scene graph generation (object,
at...
Large-scale pre-training methods of learning cross-modal representations...
Query reformulation is the process by which a input search query is refi...
This paper presents a unified Vision-Language Pre-training (VLP) model. ...
Grounding language to visual relations is critical to various
language-a...
We propose a benchmark framework to rank image attractiveness using a no...
In this paper, we study the problem of image-text matching. Inferring th...
In this paper, we introduce a web-scale general visual search system dep...