Deep learning models have a risk of utilizing spurious clues to make
pre...
As advanced image manipulation techniques emerge, detecting the manipula...
Language-guided human motion synthesis has been a challenging task due t...
Unsupervised domain adaptation (UDA) has increasingly gained interests f...
Hand trajectory forecasting from egocentric views is crucial for enablin...
Despite the impressive performance obtained by recent single-image hand
...
The techniques for 3D indoor scene capturing are widely used, but the me...
We present a unified framework for camera-space 3D hand pose estimation ...
Direct optimization of interpolated features on multi-resolution voxel g...
Neural Radiance Fields (NeRF) have led to breakthroughs in the novel vie...
To date, little attention has been given to multi-view 3D human mesh
est...
This paper presents a Generative RegIon-to-Text transformer, GRiT, for o...
Visually exploring in a real-world 4D spatiotemporal space freely in VR ...
Federated Learning (FL) is a machine learning paradigm where many local ...
Knowing the 3D motions in a dynamic scene is essential to many vision
ap...
Despite recent success of self-supervised based contrastive learning mod...
We propose a method for estimating the 6DoF pose of a rigid object with ...
Transformer trackers have achieved impressive advancements recently, whe...
Multi-task learning (MTL) paradigm focuses on jointly learning two or mo...
Video super-resolution is currently one of the most active research topi...
Video instance segmentation (VIS) task requires classifying, segmenting,...
Video Instance Segmentation (VIS) aims to simultaneously classify, segme...
Recent transformer-based solutions have been introduced to estimate 3D h...
Action visual tempo characterizes the dynamics and the temporal scale of...
In recent years, graph convolutional networks (GCNs) play an increasingl...
We present a method for reconstructing accurate and consistent 3D hands ...
We introduce the task of open-vocabulary visual instance search (OVIS). ...
This technical report presents our solution to the HACS Temporal Action
...
In this paper, we present an efficient and robust deep learning solution...
Weakly-supervised Temporal Action Localization (WS-TAL) methods learn to...
The object of Weakly-supervised Temporal Action Localization (WS-TAL) is...
Reconstructing a 3D hand from a single-view RGB image is challenging due...
Most online multi-object trackers perform object detection stand-alone i...
Knowledge distillation is an effective approach to leverage a well-train...
Graph convolutional networks (GCN) have recently demonstrated their pote...
Domain adaptation (DA) aims to transfer discriminative features learned ...
Weakly-supervised Temporal Action Localization (W-TAL) aims to classify ...
Deep convolutional neural networks (CNNs) learned on large-scale labeled...
We consider the problem of Human-Object Interaction (HOI) Detection, whi...
Despite the previous success of object analysis, detecting and segmentin...
Monotone submodular maximization with a knapsack constraint is NP-hard.
...
Learning on 3D scene-based point cloud has received extensive attention ...
Deep hashing has shown promising results in image retrieval and recognit...
Motivated by the previous success of Two-Dimensional Convolutional Neura...
Generating long-range skeleton-based human actions has been a challengin...
Accurate 3D reconstruction of the hand and object shape from a hand-obje...
Skeleton-based action recognition has attracted increasing attention due...
To facilitate depth-based 3D action recognition, 3D dynamic voxel (3DV) ...
Recent advances in the joint processing of images have certainly shown i...
In this work, we study how well different type of approaches generalise ...