Summarizing a video requires a diverse understanding of the video, rangi...
Temporal action detection aims to predict the time intervals and the cla...
We introduce You Only Train Once (YOTO), a dynamic human generation
fram...
This technical report introduces our solution, MEEV, proposed to the Ego...
Data augmentation has recently emerged as an essential component of mode...
To estimate the volume density and color of a 3D point in the multi-view...
Managing the dynamic regions in the photometric loss formulation has bee...
Joint object detection and online multi-object tracking (JDT) methods ha...
Recent self-supervised video representation learning methods focus on
ma...