Recently, large-scale pre-trained language-image models like CLIP have s...
Point-supervised Temporal Action Localization (PSTAL) is an emerging res...
Spatial convolutions are extensively used in numerous deep video models....
Current state-of-the-art approaches for few-shot action recognition achi...
Recent incremental learning for action recognition usually stores
repres...
Standard approaches for video recognition usually operate on the full in...
Spatial convolutions are widely used in numerous deep video models. It
f...
The central idea of contrastive learning is to discriminate between diff...
Temporal action localization aims to localize starting and ending time w...
Most recent approaches for online action detection tend to apply Recurre...
Weakly-Supervised Temporal Action Localization (WS-TAL) task aims to
rec...
This technical report presents our solution for temporal action detectio...
This paper presents our solution to the AVA-Kinetics Crossover Challenge...
This technical report analyzes an egocentric video action detection meth...
With the recent surge in the research of vision transformers, they have
...
Self-supervised learning presents a remarkable performance to utilize
un...
Temporal action proposal generation aims to estimate temporal intervals ...
In this report, we present our solution for the task of temporal action
...
This technical report analyzes a temporal action localization method we ...