Diffusion-based methods can generate realistic images and videos, but th...
Recently, integrating video foundation models and large language models ...
Temporal modeling is crucial for various video learning tasks. Most rece...
One-shot object detection aims at detecting novel objects according to m...
Contrastive learning between different views of the data achieves outsta...
Self-supervised learning has been successfully applied to pre-train vide...
Self-attention has been successfully applied to video representation lea...
Effective spatiotemporal feature representation is crucial to the video-...
The recent research for person re-identification has been focused on two...
Deep learning, e.g., convolutional neural networks (CNNs), has achieved ...