Recent cross-lingual cross-modal works attempt to extend Vision-Language...
Recent progress in diffusion models has revolutionized the popular techn...
Recent years have witnessed the rise and success of pre-training techniq...
Recent Vision-Language Pre-trained (VLP) models based on dual encoder ha...
Recent efforts of multimodal Transformers have improved Visually Rich
Do...
Conventional methods for the image-text generation tasks mainly tackle t...
We propose a knowledge-enhanced approach, ERNIE-ViL, to learn joint
repr...