This paper aims to tackle a novel task - Temporal Sentence Grounding in
...
How to enable learnability for new classes while keeping the capability ...
The composed image retrieval (CIR) task aims to retrieve the desired tar...
The last decade has witnessed the proliferation of micro-videos on vario...
Several studies have recently pointed that existing Visual Question Answ...
Visual Question Answering (VQA) is fundamentally compositional in nature...
Conventional knowledge distillation (KD) methods for object detection ma...
Scene Graph Generation, which generally follows a regular encoder-decode...
Recently, some contrastive learning methods have been proposed to
simult...
Understanding food recipe requires anticipating the implicit causal effe...
While successful in many fields, deep neural networks (DNNs) still suffe...
The correspondence between residual networks and dynamical systems motiv...
The panoptic segmentation task requires a unified result from semantic a...
Self-attention mechanism has been widely used for various tasks. It is
d...
Recently, a number of learning-based optimization methods that combine
d...
Recent developed deep unsupervised methods allow us to jointly learn
rep...
In this paper, we study the problem of matrix recovery, which aims to re...
Rain streaks can severely degrade the visibility, which causes many curr...
Multi-view clustering attracts much attention recently, which aims to ta...