Large language models (LLMs) have pushed the limits of natural language
...
Diffusion Probabilistic Models (DPMs) have achieved considerable success...
Referring image segmentation aims to segment the target object referred ...
Chain-of-Though (CoT) prompting has shown promising performance in vario...
Jointly processing information from multiple sensors is crucial to achie...
In generative modeling, numerous successful approaches leverage a
low-di...
Domain generalization (DG) is about learning models that generalize well...
Despite the stunning ability to generate high-quality images by recent
t...
Generative models can be categorized into two types: explicit generative...
Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerf...
Energy-Based Models (EBMs) have been widely used for generative modeling...
We introduce a new diffusion-based approach for shape completion on 3D r...
Diffusion models have attracted significant attention due to their remar...
The proliferation of pretrained models, as a result of advancements in
p...
Due to the ease of training, ability to scale, and high sample quality,
...
The diffusion probabilistic generative models are widely used to generat...
Neural Radiance Fields (NeRF) has demonstrated remarkable 3D reconstruct...
The text-driven image and video diffusion models have achieved unprecede...
Large vision and language models, such as Contrastive Language-Image
Pre...
Perception systems in modern autonomous driving vehicles typically take
...
The performance of Large Language Models (LLMs) in reasoning tasks depen...
Diffusion models have proven to be highly effective in generating
high-q...
This paper presents DetCLIPv2, an efficient and scalable training framew...
Safety is the primary priority of autonomous driving. Nevertheless, no
p...
We propose a simple, efficient, yet powerful framework for dense visual
...
Masked Autoencoder (MAE) has demonstrated superior performance on variou...
Although many recent works have investigated generalizable NeRF-based no...
Existing text-guided image manipulation methods aim to modify the appear...
Out-of-Distribution (OOD) detection, i.e., identifying whether an input ...
Vision-language pre-training (VLP) has attracted increasing attention
re...
Recent advances on large-scale pre-training have shown great potentials ...
Object detection for autonomous vehicles has received increasing attenti...
We present a novel two-stage fully sparse convolutional 3D object detect...
Open-world object detection, as a more general and challenging goal, aim...
Self-supervised depth learning from monocular images normally relies on ...
Recently, generalization on out-of-distribution (OOD) data with correlat...
Automatic theorem proving with deep learning methods has attracted atten...
Generative model based image lossless compression algorithms have seen a...
Unsupervised contrastive learning for indoor-scene point clouds has achi...
Self-supervised learning (SSL), especially contrastive methods, has rais...
Nowadays, owing to the superior capacity of the large pre-trained langua...
Neural Architecture Search (NAS) aims to find efficient models for multi...
Existing text-guided image manipulation methods aim to modify the appear...
Efficient performance estimation of architectures drawn from large searc...
Searching for the architecture cells is a dominant paradigm in NAS. Howe...
Contemporary deep-learning object detection methods for autonomous drivi...
Recently over-smoothing phenomenon of Transformer-based models is observ...
Continual learning needs to overcome catastrophic forgetting of the past...
Neural architecture search (NAS) has shown encouraging results in automa...
In this work, we introduce a novel strategy for long-tail recognition th...