Several recent works have directly extended the image masked autoencoder...
The quality of training data impacts the performance of pre-trained larg...
Multi-view clustering has attracted broad attention due to its capacity ...
Large Language Models (LLMs), armed with billions of parameters, exhibit...
We target a 3D generative model for general natural scenes that are typi...
This paper proposes an anchor-based deformation model, namely AnchorDEF,...
Fast adversarial training (FAT) is an efficient method to improve robust...
Effective modeling of complex spatiotemporal dependencies in long-form v...
A good motion retargeting cannot be reached without reasonable considera...
Image inpainting aims to fill the missing hole of the input. It is hard ...
Reconstructing two hands from monocular RGB images is challenging due to...
Reconfigurable morphing surfaces provide new opportunities for advanced
...
Speech-driven 3D facial animation has been widely studied, yet there is ...
To generate high quality rendering images for real time applications, it...
Learning with noisy label (LNL) is a classic problem that has been
exten...
Over the past few years, the prevalence of wireless devices has become o...
We present VideoReTalking, a new system to edit the faces of a real-worl...
We present a novel paradigm for high-fidelity face swapping that faithfu...
Language models (LMs) are becoming the foundation for almost all major
l...
Deep neural networks (DNNs) have been shown to be vulnerable to adversar...
In adversarial machine learning, deep neural networks can fit the advers...
Adversarial Training (AT) has been demonstrated as one of the most effec...
Deep neural networks (DNNs) are shown to be vulnerable to adversarial
ex...
Image retrieval has become an increasingly appealing technique with broa...
Video-Text Pre-training (VTP) aims to learn transferable representations...
Fast adversarial training (FAT) effectively improves the efficiency of
s...
Implicit radiance functions emerged as a powerful scene representation f...
While adversarial training and its variants have shown to be the most
ef...
Communication compression is a crucial technique for modern distributed
...
Existing 3D-aware facial generation methods face a dilemma in quality ve...
Although the pre-trained Vision Transformers (ViTs) achieved great succe...
Existing neural style transfer researches have studied to match statisti...
In the fifth-generation and beyond era, reconfigurable intelligent surfa...
Video deblurring is still an unsolved problem due to the challenging
spa...
Video transformers have recently emerged as an effective alternative to
...
Neural volume rendering has been proven to be a promising method for
eff...
Unsupervised video representation learning has made remarkable achieveme...
Pre-training video transformers on extra large-scale datasets is general...
Recent studies in deepfake detection have yielded promising results when...
Adversarial training (AT) is always formulated as a minimax problem, of ...
One-shot talking face generation aims at synthesizing a high-quality tal...
Vision Transformers (ViTs) take all the image patches as tokens and cons...
Conventional 3D human pose estimation relies on first detecting 2D body
...
Contrastive learning has been proven suitable for learning sentence
embe...
Previous portrait image generation methods roughly fall into two categor...
Neural Radiance Fields (NeRF) has recently gained popularity for its
imp...
Neural Radiance Field (NeRF) has gained considerable attention recently ...
While contrastive learning greatly advances the representation of senten...
Studies on self-supervised visual representation learning (SSL) improve
...
Adversarial training (AT) has been demonstrated to be effective in impro...