Accumulating substantial volumes of real-world driving data proves pivot...
We present a model that can perform multiple vision tasks and can be ada...
General physical scene understanding requires more than simply localizin...
Self-training is an important technique for solving semi-supervised lear...
Current LiDAR odometry, mapping and localization methods leverage point-...
Embodied control requires agents to leverage multi-modal pre-training to...
Humans, even at a very early age, can learn visual concepts and understa...
Humans possess a versatile mechanism for extracting structured
represent...
Existing large language model-based code generation pipelines typically ...
Diffusion models have demonstrated their powerful generative capability ...
Optimization in multi-task learning (MTL) is more challenging than
singl...
Neural Radiance Fields (NeRFs) have been successfully used for scene
rep...
Transformer has achieved great successes in learning vision and language...
Objects' motions in nature are governed by complex interactions and thei...
In this work, we introduce Dual Attention Vision Transformers (DaViT), a...
We present a novel masked image modeling (MIM) approach, context autoenc...
In this work, we propose a unified framework, called Visual Reasoning wi...
Reducing the complexity of the pipeline of instance segmentation is cruc...
In this paper, we propose an efficient and effective dense hybrid recurr...
Transparent objects such as windows and bottles made by glass widely exi...
3D vehicle detection based on point cloud is a challenging task in real-...
3D object detection from a single image without LiDAR is a challenging t...
In this paper, we propose an effective and efficient pyramid multi-view
...
A major challenge for video semantic segmentation is the lack of labeled...
Automated deception detection (ADD) from real-life videos is a challengi...
Zero-shot learning (ZSL) aims to recognize unseen object classes without...