Recent advancements in multimodal foundation models (e.g., CLIP) have
ex...
Weakly supervised Referring Expression Grounding (REG) aims to ground a
...
Vehicle Re-Identification is to find images of the same vehicle from var...
Weakly supervised referring expression grounding (REG) aims at localizin...
Weakly supervised referring expression grounding aims at localizing the
...