The automatic co-speech gesture generation draws much attention in compu...
Generating 3D faces from textual descriptions has a multitude of
applica...
Stable diffusion, a generative model used in text-to-image synthesis,
fr...
In this paper, we introduce the DiffuseStyleGesture+, our solution for t...
This paper presents a novel network structure with illumination-aware ga...
Continual learning aims to enable a model to incrementally learn knowled...
Pre-trained vision-language models have inspired much research on few-sh...
Top-down methods dominate the field of 3D human pose and shape estimatio...
Instance segmentation in 3D scenes is fundamental in many applications o...
Despite the substantial progress of active learning for image recognitio...
Given an incomplete image without additional constraint, image inpaintin...
Category-level 6D object pose and size estimation is to predict 9
degree...
How to estimate the quality of the network output is an important issue,...
Temporal action localization is an important and challenging task that a...
Multiple human parsing aims to segment various human parts and associate...
We present a method for depth estimation with monocular images, which ca...
We show that existing upsampling operators can be unified with the notio...