In this paper we establish an abstraction of on-the-fly determinization ...
Large language models rely on real-valued representations of text to mak...
Sampling is a common strategy for generating text from probabilistic mod...
A fundamental result in psycholinguistics is that less predictable words...
Many popular feature-attribution methods for interpreting deep neural
ne...
This paper provides a reference description, in the form of a deduction
...
Subword tokenization is a key part of many NLP pipelines. However, littl...
Byte-Pair Encoding (BPE) is a popular algorithm used for tokenizing data...
We introduce a novel dependency parser, the hexatagger, that constructs
...
Concept erasure aims to remove specified features from a representation....
While natural languages differ widely in both canonical word order and w...
Weir has defined a hierarchy of language classes whose second member
(ℒ_...
Recently, there has been a growing interest in the development of
gradie...
Multiple algorithms are known for efficiently calculating the prefix
pro...
We show that most structured prediction problems can be solved in linear...
Transformer models bring propelling advances in various NLP tasks, thus
...
The fixed-size context of Transformer makes GPT models incapable of
gene...
The primary way of building AI applications is shifting from training
sp...
Several recent papers claim human parity at sentence-level Machine
Trans...
Large language models generate fluent texts and can follow natural langu...
Recent advances in text-to-image diffusion models have enabled the gener...
Weighted finite-state automata (WSFAs) are commonly used in NLP. Failure...
Language modeling, a central task in natural language processing, involv...
Many dynamical systems exhibit latent states with intrinsic orderings su...
Over the past two decades, numerous studies have demonstrated how less
p...
Recent work has shown that despite their impressive capabilities,
text-t...
There have been many proposals to reduce constituency parsing to tagging...
In this paper, we seek to measure how much information a component in a
...
Recent years have seen a paradigm shift in NLP towards using pretrained
...
Centering theory (CT; Grosz et al., 1995) provides a linguistic analysis...
Machine translation (MT) has almost achieved human parity at sentence-le...
Despite significant progress in the quality of language generated from
a...
Previous work on concept identification in neural representations has fo...
Weighted pushdown automata (WPDAs) are at the core of many natural langu...
For the quantitative monitoring of international relations, political ev...
The ability to generalize compositionally is key to understanding the
po...
The Bar-Hillel construction is a classic result in formal language theor...
Every legal case sets a precedent by developing the law in one of the
fo...
Recombining known primitive concepts into larger novel combinations is a...
Neural language models are widely used; however, their model parameters ...
Probing is a popular method to discern what linguistic information is
co...
The SIGMORPHON 2022 shared task on morpheme segmentation challenged syst...
While probabilistic language generators have improved dramatically over ...
Probing has become a go-to methodology for interpreting and analyzing de...
Many natural language processing tasks, e.g., coreference resolution and...
The Universal Morphology (UniMorph) project is a collaborative effort
pr...
The success of multilingual pre-trained models is underpinned by their
a...
Significance testing – especially the paired-permutation test – has play...
A central quest of probing is to uncover how pre-trained models encode a...
Shannon entropy is often a quantity of interest to linguists studying th...