The recent video grounding works attempt to introduce vanilla contrastiv...
Retrieval augmentation enables large language models to take advantage o...
Despite progress in vision-based inspection algorithms, real-world indus...
Convolution-BatchNorm (ConvBN) blocks are integral components in various...
With adversarial or otherwise normal prompts, existing large language mo...
Generative adversarial networks (GANs), trained on a large-scale image
d...
State-of-the-art abstractive summarization systems frequently hallucinat...
Video grounding aims to locate a moment of interest matching the given q...
Let SLAut(𝔽_q^n) denote the group of all semilinear
isometries on 𝔽_q^n,...
Video Referring Expression Comprehension (REC) aims to localize a target...
We present evidence for the existence and effectiveness of adversarial
a...
Multi-view learning has progressed rapidly in recent years. Although man...
We investigate the problem of video Referring Expression Comprehension (...
Video-Text Pre-training (VTP) aims to learn transferable representations...
Automatic summarization is the process of shortening a set of textual da...
Unsupervised Source (data) Free domain adaptation (USFDA) aims to transf...
Unsupervised pixel-level defective region segmentation is an important t...
Unsupervised video representation learning has made remarkable achieveme...
Display front-of-screen (FOS) quality inspection is essential for the ma...
Graph Neural Networks (GNNs) have achieved great success in various task...
As the adoption of deep learning techniques in industrial applications g...
Message passing is the core of most graph models such as Graph Convoluti...
Video grounding aims to localize the temporal segment corresponding to a...
Recent research has witnessed advances in facial image editing tasks
inc...
Weakly-Supervised Temporal Action Localization (WSTAL) aims to localize
...
Arbitrary-shaped text detection is a challenging task since curved texts...
As the applications of deep learning models on edge devices increase at ...
Video Frame Interpolation synthesizes non-existent images between adjace...
Human-Object Interaction (HOI) detection devotes to learn how humans int...
Weakly-supervised temporal action localization (WS-TAL) aims to localize...
Matrix-product codes over finite fields are an important class of long l...
Neural abstractive summarization systems have achieved promising progres...
Inferring missing facts in temporal knowledge graphs (TKGs) is a fundame...
During the COVID-19 pandemic, rapid and accurate triage of patients at t...
Recent research has witnessed the advances in facial image editing tasks...
Deep learning-based scene text detection methods have progressed
substan...
Unsupervised domain adaptation (UDA) is an emerging research topic in th...
Domain adaptation (DA) is an emerging research topic in the field of mac...
The hull of linear codes plays an important role in quantum information ...
Referring Expression Generation (REG) is the task of generating contextu...