In this study, we address the challenge of 3D scene structure recovery f...
3D dense captioning requires a model to translate its understanding of a...
Recent strides in Text-to-3D techniques have been propelled by distillin...
The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 ha...
Reconstructing accurate 3D scenes from images is a long-standing vision ...
Though the advancement of pre-trained large language models unfolds, the...
Recently, deep learning-based facial landmark detection has achieved
sig...
The recent advancements in image-text diffusion models have stimulated
r...
Image matting requires high-quality pixel-level human annotations to sup...
Since the introduction of Vision Transformers, the landscape of many com...
Neural Radiance Fields (NeRF) has achieved impressive results in single
...
3D dense captioning aims to generate multiple captions localized with th...
We study a challenging task, conditional human motion generation, which
...
Motion capture from a monocular video is fundamental and crucial for us
...
Implicit neural 3D representation has achieved impressive results in sur...
In this paper, we address monocular depth estimation with deep neural
ne...
3D human pose estimation from a monocular video has recently seen signif...
Facial semantic guidance (facial landmarks, facial parsing maps, facial
...
[Purpose] The pathology is decisive for disease diagnosis, but relies he...
Although vision transformers (ViTs) have achieved great success in compu...
Video creation has been an attractive yet challenging task for artists t...
Recent face reenactment works are limited by the coarse reference landma...
Intelligent data-driven fault diagnosis methods have been widely applied...
The goal of few-shot fine-grained image classification is to recognize r...
Respiratory diseases, including asthma, bronchitis, pneumonia, and upper...
Very recently, Window-based Transformers, which computed self-attention
...
Digital pathology slide is easy to store and manage, convenient to brows...
The low-level details and high-level semantics are both essential to the...
Recent works have widely explored the contextual dependencies to achieve...
Occluded person re-identification (ReID) aims to match occluded person i...
Real-time semantic segmentation plays a significant role in industry
app...
Visual tracking problem demands to efficiently perform robust classifica...
Learning discriminative global features plays a vital role in semantic
s...
Detecting human in a crowd is a challenging problem due to the uncertain...
This report presents our method which wins the nuScenes3D Detection Chal...
Scene text detection, an important step of scene text reading systems, h...
Current state-of-the-art approaches for spatio-temporal action detection...
Real-time generic object detection on mobile platforms is a crucial but
...
Panoptic segmentation, which needs to assign a category label to each pi...
This paper presents a review of the 2018 WIDER Challenge on Face and
Ped...
Existing pose estimation approaches can be categorized into single-stage...
Scene text detection methods based on deep learning have achieved remark...
Recent advances in deep convolutional neural networks (CNNs) have motiva...
Semantic segmentation requires both rich spatial information and sizeabl...
Human detection has witnessed impressive progress in recent years. Howev...
Most existing methods of semantic segmentation still suffer from two asp...
Face detection serves as a fundamental research topic for many applicati...
Recent CNN based object detectors, no matter one-stage methods like YOLO...
In this paper we present a robust tracker to solve the multiple object
t...
The topic of multi-person pose estimation has been largely improved rece...