3D scene understanding has gained significant attention due to its wide ...
3D visual grounding aims to localize the target object in a 3D point clo...
3D visual grounding involves finding a target object in a 3D scene that
...
Multi-modal Contrastive Representation (MCR) learning aims to encode
dif...