Large Vision-Language Models (LVLMs) have recently achieved remarkable
s...
Vision-Language Pre-training (VLP) methods based on object detection enj...
Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...
Document understanding refers to automatically extract, analyze and
comp...
To promote the development of Vision-Language Pre-training (VLP) and
mul...
We propose to Transform Scene Graphs (TSG) into more descriptive caption...
Large language models (LLMs) have demonstrated impressive zero-shot abil...
In this paper, we present ChatPLUG, a Chinese open-domain dialogue syste...
Recent years have witnessed a big convergence of language, vision, and
m...
Aligning objects with words plays a critical role in Image-Language BERT...
Video-language pre-training has advanced the performance of various
down...
Aerial scene classification remains challenging as: 1) the size of key
o...
Based on CT and MRI images acquired from normal pressure hydrocephalus (...
Video summarization aims to automatically generate a diverse and concise...
The world is currently experiencing an ongoing pandemic of an infectious...
Explainable Artificial Intelligence (XAI) is an emerging research topic ...
Video summarization is an effective way to facilitate video searching an...
With the improvements of Los Angeles in many aspects, people in mounting...