We present PAT, a transformer-based network that learns complex temporal...
Generating grammatically and semantically correct captions in video
capt...
We propose a novel framework to reconstruct super-resolution human shape...
Active speaker detection (ASD) is a multi-modal task that aims to identi...
A common problem in the 4D reconstruction of people from multi-view vide...
This paper proposes a novel Attention-based Multi-Reference Super-resolu...
Estimating the pose of animals can facilitate the understanding of anima...
We present a new end-to-end learning framework to obtain detailed and
sp...
We present a novel method to learn temporally consistent 3D reconstructi...
We present a novel method to improve the accuracy of the 3D reconstructi...
Deep representation learning is a crucial procedure in multimedia analys...
Immersive audio-visual perception relies on the spatial integration of b...
Existing methods for stereo work on narrow baseline image pairs giving
l...
We present an approach to accurately estimate high fidelity markerless 3...
Semantic scene completion is the task of predicting a complete 3D
repres...
We introduce the first approach to solve the challenging problem of
unsu...
Existing techniques for dynamic scene reconstruction from multiple
wide-...
We present a convolutional autoencoder that enables high fidelity volume...
We present a method for simultaneously estimating 3D human pose and body...
Light-field video has recently been used in virtual and augmented realit...
Semantic scene completion is the task of producing a complete 3D voxel
r...
Apparatus and methods are disclosed for performing object-based audio
re...
This paper presents an approach for reconstruction of 4D temporally cohe...