As the size of transformer-based models continues to grow, fine-tuning t...
Recent progress in deep learning is essentially based on a "big data for...
Rapid progress has been witnessed for human-object interaction (HOI)
rec...
This work proposes to combine neural networks with the compositional
hie...
Detecting 3D objects from a single RGB image is intrinsically ambiguous,...
Learning transferable knowledge across similar but different settings is...
We propose a new 3D holistic++ scene understanding problem, which jointl...
We propose a novel model to address the task of Visual Dialog which exhi...
We propose VRGym, a virtual reality testbed for realistic human-robot
in...
Holistic 3D indoor scene understanding refers to jointly recovering the ...
We present a human-centric method to sample and synthesize 3D room layou...
This paper addresses the task of detecting and recognizing human-object
...
We propose a computational framework to jointly parse a single RGB image...
Future predictions on sequence data (e.g., videos or audios) require the...
This paper proposes an intent-aware multi-agent planning framework as we...
This paper presents a novel method to predict future human activities fr...
We propose the configurable rendering of massive quantities of photoreal...