Referring image segmentation aims to segment the target object referred ...
Despite the stunning ability to generate high-quality images by recent
t...
Recent vision-only perception models for autonomous driving achieved
pro...
Generative models can be categorized into two types: explicit generative...
Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerf...
We introduce a new diffusion-based approach for shape completion on 3D r...
Diffusion models have attracted significant attention due to their remar...
The text-driven image and video diffusion models have achieved unprecede...
Perception systems in modern autonomous driving vehicles typically take
...
The performance of Large Language Models (LLMs) in reasoning tasks depen...
Diffusion models have proven to be highly effective in generating
high-q...
Safety is the primary priority of autonomous driving. Nevertheless, no
p...
We propose a simple, efficient, yet powerful framework for dense visual
...
Cooperatively utilizing both ego-vehicle and infrastructure sensor data ...
Recently, perception task based on Bird's-Eye View (BEV) representation ...
Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes...
Recent scene text detection methods are almost based on deep learning an...
Recent studies show that Vision Transformers(ViTs) exhibit strong robust...
In this paper, we propose M^2BEV, a unified framework that jointly perfo...
Monocular visual odometry (VO) is an important task in robotics and comp...
3D visual perception tasks, including 3D detection and map segmentation ...
Although convolutional neural networks (CNNs) have achieved remarkable
p...
We propose an accurate and efficient scene text detection framework, ter...
We present Panoptic SegFormer, a general framework for end-to-end panopt...
This paper presents a simple MLP-like architecture, CycleMLP, which is a...
We present SegFormer, a simple, efficient yet powerful semantic segmenta...
Reducing the complexity of the pipeline of instance segmentation is cruc...
Detecting transparent objects in natural scenes is challenging due to th...
We present an extremely simple Ultra-Resolution Style Transfer framework...
Unsupervised representation learning achieves promising performances in
...
Unsupervised contrastive learning achieves great success in learning ima...
This work presents a new fine-grained transparent object segmentation
da...
Multiple-object tracking(MOT) is mostly dominated by complex and multi-s...
End-to-end one-stage object detection trailed thus far. This paper disco...
Although a polygon is a more accurate representation than an upright bou...
Deep learning-based scene text detection can achieve preferable performa...
Scene text spotting aims to detect and recognize the entire word or sent...
Multi-person pose estimation is challenging because it localizes body
ke...
Low-resolution text images are often seen in natural scenes such as docu...
Transparent objects such as windows and bottles made by glass widely exi...
This article introduces the solutions of the two champion teams, `MMfrui...
In this paper, we introduce an anchor-box free and single shot instance
...
Scene text recognition has witnessed rapid development with the advance ...
Scene text detection, an important step of scene text reading systems, h...
Scene text detection methods based on deep learning have achieved remark...
Ordered binary decision diagrams (OBDDs) are an efficient data structure...
This paper investigates the issue of realistic plant species identificat...