Wasserstein distance, which measures the discrepancy between distributio...
Hate speech detection is complex; it relies on commonsense reasoning,
kn...
Self-supervised pretraining has made few-shot learning possible for many...
Mixture of Experts layers (MoEs) enable efficient scaling of language mo...
Large-scale autoregressive language models such as GPT-3 are few-shot
le...
Do language models have beliefs about the world? Dennett (1995) famously...
The Wasserstein barycenter has been widely studied in various fields,
in...
At the heart of text based neural models lay word representations, which...
Recent research efforts enable study for natural language grounded navig...
In recent years, sentiment analysis in social media has attracted a lot ...
In response to the continuing research interest in computational semanti...
In this paper, we describe SemEval-2013 Task 4: the definition, the data...
Recently, there has been strong interest in developing natural language
...
Neural word representations are at the core of many state-of-the-art nat...
Reading comprehension is a challenging task in natural language processi...
Knowledge graph (KG) is known to be helpful for the task of question
ans...