Large language models (LLMs) have recently demonstrated remarkable
capab...
Large Vision-Language Models (LVLMs) have recently achieved remarkable
s...
Vision-Language Pre-training (VLP) methods based on object detection enj...
Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...
Document understanding refers to automatically extract, analyze and
comp...
Vision Transformer(ViT) is now dominating many vision tasks. The drawbac...
To promote the development of Vision-Language Pre-training (VLP) and
mul...
Fine-tuning large pre-trained language models on various downstream task...
Cross-modal contrastive learning in vision language pretraining (VLP) fa...
We propose to Transform Scene Graphs (TSG) into more descriptive caption...
Large language models (LLMs) have demonstrated impressive zero-shot abil...
In this paper, we present ChatPLUG, a Chinese open-domain dialogue syste...
Recent years have witnessed a big convergence of language, vision, and
m...
We design a novel global-local Transformer named Ada-ClustFormer
(ACF) t...
Aligning objects with words plays a critical role in Image-Language BERT...
Video-language pre-training has advanced the performance of various
down...
Large-scale pretrained foundation models have been an emerging paradigm ...
Image Captioning (IC) has achieved astonishing developments by incorpora...
The Visual Question Answering (VQA) task utilizes both visual image and
...
Existing approaches to vision-language pre-training (VLP) heavily rely o...
Vision-language pre-training (VLP) on large-scale image-text pairs has
a...
Advertising expenditures have become the major source of revenue for
e-c...
Vision-language pre-training (VLP) on large-scale image-text pairs has
r...
To drive purchase in online advertising, it is of the advertiser's great...
For e-commerce platforms such as Taobao and Amazon, advertisers play an
...
Recent years have witnessed a surge of interests of using neural topic m...
Multi-class text classification is one of the key problems in machine
le...
Text summarization aims to generate a headline or a short summary consis...
Abstractive text summarization is a challenging task, and one need to de...
Speech emotion recognition is a challenging problem because human convey...
In this paper we present DELTA, a deep learning based language technolog...