Large and sparse feed-forward networks (S-FFN) such as Mixture-of-Expert...
Self-supervised pretraining has made few-shot learning possible for many...
Current methods for few-shot fine-tuning of pretrained masked language m...
Large-scale autoregressive language models such as GPT-3 are few-shot
le...
Do language models have beliefs about the world? Dennett (1995) famously...
Retrieving relevant contexts from a large corpus is a crucial step for t...
The state of the art on many NLP tasks is currently achieved by large
pr...
Recent breakthroughs of pretrained language models have shown the
effect...
In recent years, sentiment analysis in social media has attracted a lot ...
We describe the Sentiment Analysis in Twitter task, ran as part of
SemEv...
In this paper, we describe the 2015 iteration of the SemEval shared task...
This paper discusses the fourth year of the “Sentiment Analysis in Twitt...
This paper shows that pretraining multilingual language models at scale ...
We study the problem of multilingual masked language modeling, i.e. the
...
The scarcity of labeled training data often prohibits the
internationali...
Language model pretraining has led to significant performance gains but
...
Traditional language models are unable to efficiently model entity names...
State-of-the-art natural language processing systems rely on supervision...
Neural Machine Translation (NMT) typically leverages monolingual data in...