The raw depth image captured by indoor depth sensors usually has an exte...
Weakly-supervised temporal action localization aims to locate action reg...
In this paper, we propose the semantic graph Transformer (SGT) for the 3...
Predicting attention regions of interest is an important yet challenging...
Video denoising aims to recover high-quality frames from the noisy video...
The raw depth image captured by the indoor depth sensor usually has an
e...
Deep learning-solutions for hand-object 3D pose and shape estimation are...
Scene graph generation refers to the task of automatically mapping an im...