Video Semantic Role Labeling (VidSRL) aims to detect the salient events ...
Visual spatial description (VSD) aims to generate texts that describe th...
Image-to-text tasks, such as open-ended image captioning and controllabl...
Graph Neural Networks (GNNs) have gained great popularity in tackling va...
Speaker embedding is an important front-end module to explore discrimina...
Automatic speaker verification (ASV) systems, which determine whether tw...
This work studied embedding positions of digital audio watermarking in
w...