The top-down and bottom-up methods are two mainstreams of referring
segm...
Text-video retrieval is a challenging cross-modal task, which aims to al...
Interactive segmentation enables users to segment as needed by providing...
Existing text-video retrieval solutions are, in essence, discriminant mo...
Unified visual grounding pursues a simple and generic technical route to...
Weakly supervised semantic segmentation is typically inspired by class
a...
Recently, the ability of self-supervised Vision Transformer (ViT) to
rep...
3D visual grounding aims to find the objects within point clouds mention...