To promote the development of Vision-Language Pre-training (VLP) and
mul...
Large language models (LLMs) have demonstrated impressive zero-shot abil...
Recent years have witnessed a big convergence of language, vision, and
m...
Multi-modal information is essential to describe what has happened in a
...