The field of generative AI has a transformative impact on various areas,...
Large language models have become a potential pathway toward achieving
a...
With the development of deep learning, the field of face anti-spoofing (...
This paper shows that Masking the Deep hierarchical features is an effic...
Recent self-supervised methods are mainly designed for representation
le...
Recently, perception task based on Bird's-Eye View (BEV) representation ...
Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes...
Human-Object Interaction (HOI) detection aims to learn how human interac...
Document-level natural language inference (DOCNLI) is a new challenging ...
Multimodal ambiguity and color bleeding remain challenging in colorizati...
Existing Binary Neural Networks (BNNs) mainly operate on local convoluti...
Mainstream object detectors are commonly constituted of two sub-tasks,
i...
Though impressive performance has been achieved in specific visual realm...
Capitalizing on large pre-trained models for various downstream tasks of...
The field of face anti-spoofing (FAS) has witnessed great progress with ...
Document-level Event Causality Identification (DECI) aims to identify ca...
Realistic visual media synthesis is becoming a critical societal issue w...
In computer vision, pre-training models based on largescale supervised
l...
Large-scale datasets play a vital role in computer vision. Existing data...
Contrastive Language-Image Pretraining (CLIP) has emerged as a novel par...
In this paper, we propose an effective yet efficient model PAIE for both...
Recently, self-supervised vision transformers have attracted unprecedent...
Unsupervised sentence embedding aims to obtain the most appropriate embe...
The rapid progress of photorealistic synthesis techniques has reached a
...
The visual world naturally exhibits a long-tailed distribution of open
c...
Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has...
Face anti-spoofing (FAS) is an indispensable and widely used module in f...
The rapid progress of photorealistic synthesis techniques has reached at...
As facial interaction systems are prevalently deployed, security and
rel...
Recently, deep learning has been utilized to solve video recognition pro...
Seeking effective neural networks is a critical and practical field in d...
As facial interaction systems are prevalently deployed, security and
rel...
As realistic facial manipulation technologies have achieved remarkable
p...
This technical report introduces our winning solution to the spatio-temp...
Localizing persons and recognizing their actions from videos is a challe...
3D point cloud completion, the task of inferring the complete geometric ...
Semantic image synthesis aims at generating photorealistic images from
s...
Text-image cross-modal retrieval is a challenging task in the field of
l...
Synthesizing photo-realistic images from text descriptions is a challeng...
Dense captioning aims at simultaneously localizing semantic regions and
...
This paper proposes the novel task of video generation conditioned on a
...
Imagining multiple consecutive frames given one single snapshot is
chall...
Referring expression grounding aims at locating certain objects or perso...
Multi-label image classification is a fundamental but challenging task
t...
Pedestrian attribute recognition has attracted many attentions due to it...
Convolutional neural networks have gained a remarkable success in comput...
Recognizing visual relationships <subject-predicate-object> among any pa...
Zero-shot artistic style transfer is an important image synthesis proble...
This paper proposes learning disentangled but complementary face feature...
The aim of image captioning is to generate similar captions by machine a...