Temporal representation is the cornerstone of modern action detection
te...
Cross-modality interaction is a critical component in Text-Video Retriev...
Recent works have shown that convolutional networks have substantially
i...
Few-shot class-incremental learning (FSCIL) aims to design machine learn...
Nowadays, live-stream and short video shopping in E-commerce have grown
...
In many real-world datasets, like WebVision, the performance of DNN base...