Many studies focus on improving pretraining or developing new backbones ...
A key challenge with procedure planning in instructional videos lies in ...
We tackle the data scarcity challenge in few-shot point cloud recognitio...
The main challenge in video question answering (VideoQA) is to capture a...
Text-video retrieval contains various challenges, including biases comin...
Bio-inspired learning has been gaining popularity recently given that
Ba...
A tiny object in the sky cannot be an elephant. Context reasoning is cri...
Tremendous progress has been made in continual learning to maintain good...
Scene text images have different shapes and are subjected to various
dis...
Fine-grained action recognition is a challenging task in computer vision...
Conventional de-noising methods rely on the assumption that all samples ...
In this report, we present our approach for EPIC-KITCHENS-100 Multi-Inst...
With the emergence of social media, voluminous video clips are uploaded ...
Seas of videos are uploaded daily with the popularity of social channels...
When deploying a robot to a new task, one often has to train it to detec...
The focus of this paper is on the problem of image retrieval with attrib...
Fine-grained human action recognition is a core research topic in comput...
6D object pose estimation is widely applied in robotic tasks such as gra...
Continual learning is a critical ability of continually acquiring and
tr...
Children benefit from lift-the-flap books by taking on an active role in...
Segmenting video content into events provides semantic structures for
in...
Egocentric spatial memory (ESM) defines a memory system with encoding,
s...
Can we infer intentions and goals from a person's actions? As an example...
Searching for a target object in a cluttered scene constitutes a fundame...