Most existing forecasting systems are memory-based methods, which attemp...
The combination of audio and vision has long been a topic of interest in...
With the exponential growth of video data, there is an urgent need for
a...
Action detection is a challenging video understanding task, requiring
mo...
In this report, we present our champion solution to the WSDM2023 Toloka
...
The foundation models have recently shown excellent performance on a var...
In this report, we present our champion solutions to five tracks at Ego4...
Capturing the state changes of interacting objects is a key technology f...
We provide the technical report for Ego4D audio-only diarization challen...
Cumulative probability models (CPMs) are a robust alternative to linear
...
Temporal action detection (TAD) is extensively studied in the video
unde...
Temporal action detection aims to locate the boundaries of action in the...
Spoken dialogue systems such as Siri and Alexa provide great convenience...