Decentralized minimax optimization has been actively studied in the past...
Joint video-language learning has received increasing attention in recen...
Matching-based methods, especially those based on space-time memory, are...
Existing vision-language pre-training (VLP) methods primarily rely on pa...
Recently, MLP-like vision models have achieved promising performances on...
This paper presents Poisoning MorphNet, the first backdoor attack method...
MoCo is effective for unsupervised image representation learning. In thi...
Automatically generating sentences to describe events and temporally
loc...
The task of temporally grounding language queries in videos is to tempor...
Purpose: To develop a deep learning-based Bayesian inference for MRI
rec...
In this paper, we propose to guide the video caption generation with
Par...
Abdominal magnetic resonance imaging (MRI) provides a straightforward wa...
In this paper, we propose a novel model with a hierarchical photo-scene
...
Recently, much advance has been made in image captioning, and an
encoder...
Recently, much advance has been made in image captioning, and an
encoder...
Dense video captioning is a newly emerging task that aims at both locali...
Recently, caption generation with an encoder-decoder framework has been
...
Domain adaptation problems arise in a variety of applications, where a
t...