While current visual captioning models have achieved impressive performa...
Document understanding refers to automatically extract, analyze and
comp...
Image captioning aims to describe visual content in natural language. As...
To promote the development of Vision-Language Pre-training (VLP) and
mul...
To help the visually impaired enjoy movies, automatic movie narrating sy...
Automatic image captioning evaluation is critical for benchmarking and
p...
Large language models (LLMs) have demonstrated impressive zero-shot abil...
Multimodal processing has attracted much attention lately especially wit...
For an image with multiple scene texts, different people may be interest...
Most current image captioning systems focus on describing general image
...