Retrieval augmentation is a powerful but expensive method to make langua...
Memory-augmentation is a powerful approach for efficiently incorporating...
Multi-query attention (MQA), which only uses a single key-value head,
dr...
We present our work on developing a multilingual, efficient text-to-text...
Webpages have been a rich resource for language and vision-language task...
Webpages have been a rich, scalable resource for vision-language and lan...
We propose Conditional Adapter (CoDA), a parameter-efficient transfer
le...
Many natural language processing tasks benefit from long inputs, but
pro...
Retrieval-augmented language models such as Fusion-in-Decoder are powerf...
Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model...
Training large, deep neural networks to convergence can be prohibitively...
A common recent approach to semantic parsing augments sequence-to-sequen...
We combine the capacity of sparsely gated Mixture-of-Experts (MoE) with ...
Machine learning models such as Transformers or LSTMs struggle with task...
Sequence modeling has demonstrated state-of-the-art performance on natur...
Recent work has shown that either (1) increasing the input length or (2)...
Deep learning models do well at generalizing to in-distribution data but...
We present ShopTalk, a multi-turn conversational faceted search system f...
Several studies have reported the inability of Transformer models to
gen...
Compositional generalization is the ability to generalize systematically...
Knowledge-intensive tasks such as question answering often require
assim...
We show that Transformer encoder architectures can be massively sped up,...
Transformer is the backbone of modern NLP models. In this paper, we prop...
Transformers-based models, such as BERT, have been one of the most succe...
Transformer-based models have pushed the state of the art in many natura...