Text-to-motion generation is a formidable task, aiming to produce human
...
State-of-the-art deep neural networks are trained with large amounts
(mi...
Dynamic vision sensors or event cameras provide rich complementary
infor...
We propose a novel task for generating 3D dance movements that simultane...
Sequential video understanding, as an emerging video understanding task,...
Current audio-visual separation methods share a standard architecture de...
Existing fine-tuning methods either tune all parameters of the pre-train...
Counting repetitive actions are widely seen in human activities such as
...
In this paper, we propose a novel sequence verification task that aims t...
An Axial Shifted MLP architecture (AS-MLP) is proposed in this paper.
Di...
An LBYL (`Look Before You Leap') Network is proposed for end-to-end trai...
By borrowing the wisdom of human in gaze following, we propose a two-sta...
Single-image piece-wise planar 3D reconstruction aims to simultaneously
...
Anomaly detection in videos refers to the identification of events that ...