Instruction tuning unlocks the superior capability of Large Language Mod...
Document images are a ubiquitous source of data where the text is organi...
Recognizing out-of-distribution (OOD) samples is critical for machine
le...
In this paper, we introduce a novel concept of user-entity differential
...
One of the major challenges in training text-to-image generation models ...
Recent studies indicate that NLU models are prone to rely on shortcut
fe...
Most recent image captioning works are conducted in English as the major...
Change Captioning is a task that aims to describe the difference between...
Mobile energy storage systems (MESSs) provide mobility and flexibility t...
Scene graph generation has received growing attention with the advanceme...
Deep neural networks have achieved great success on the image captioning...
The explosion of video data on the internet requires effective and effic...
Image captioning is a multimodal task involving computer vision and natu...
Textual-visual cross-modal retrieval has been a hot research topic in bo...
The existing image captioning approaches typically train a one-stage sen...
Language Models based on recurrent neural networks have dominated recent...
In the last few years, deep learning has led to very good performance on...