Recently, the tokens of images share the same static data flow in many d...
Active domain adaptation (ADA) aims to improve the model adaptation
perf...
Recent works on personalized text-to-image generation usually learn to b...
We present a hardware-efficient architecture of convolutional neural net...
It is well believed that the higher uncertainty in a word of the caption...
Recently, vector quantized autoregressive (VQ-AR) models have shown
rema...
Ensemble of machine learning models yields improved performance as well ...
Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to
...
Existing approaches to image captioning usually generate the sentence
wo...
Referring video object segmentation aims to predict foreground labels fo...
BiSeNet has been proved to be a popular two-stream network for real-time...
In this paper, we present a novel and general network structure towards
...
We address the problem of cross-domain image retrieval, considering the
...
Convolutional Neural Network (CNN) has demonstrated promising performanc...