Transformer has been successfully used in practical applications, such a...
Task-agnostic knowledge distillation attempts to address the problem of
...
We argue that a form of the valuable information provided by the auxilia...
Pre-trained language models have achieved state-of-the-art results in va...
Pre-trained models have achieved state-of-the-art results in various Nat...
This paper presents to integrate the auxiliary information (e.g., additi...
Pretrained language models (PLMs) such as BERT adopt a training paradigm...