Masked image modeling has demonstrated great potential to eliminate the
...
A big convergence of language, vision, and multimodal pretraining is
eme...
Masked image modeling (MIM) has demonstrated impressive results in
self-...
We introduce a vision-language foundation model called VL-BEiT, which is...
As more and more pre-trained language models adopt on-cloud deployment, ...
We introduce Corrupted Image Modeling (CIM) for self-supervised visual
p...
We present a unified Vision-Language pretrained Model (VLMo) that jointl...
Pretrained bidirectional Transformers, such as BERT, have achieved
signi...
ELECTRA pretrains a discriminator to detect replaced tokens, where the
r...
We introduce a self-supervised vision representation model BEiT, which s...
Recent progress of abstractive text summarization largely relies on larg...
We generalize deep self-attention distillation in MiniLM (Wang et al., 2...
We propose to pre-train a unified language model for both autoencoding a...
Pre-trained language models (e.g., BERT (Devlin et al., 2018) and its
va...
In this paper, we study a novel task that learns to compose music from
n...
Automatic question generation aims to generate questions from a text pas...