Real-time recurrent learning (RTRL) for sequence-processing recurrent ne...
Current state-of-the-art object-centric models use slots and attention-b...
Few-shot learning with sequence-processing neural networks (NNs) has rec...
Unsupervised learning of discrete representations from continuous ones i...
Short-term memory in standard, general-purpose, sequence-processing recu...
Well-designed diagnostic tasks have played a key role in studying the fa...
Work on fast weight programmers has demonstrated the effectiveness of
ke...
Neural ordinary differential equations (ODEs) have attracted much attent...
The discovery of reusable sub-routines simplifies decision-making and
pl...
Linear layers in neural networks (NNs) trained by gradient descent can b...
The weight matrix (WM) of a neural network (NN) is its program. The prog...
We share our experience with the recently released WILDS benchmark, a
co...
The inputs and/or outputs of some neural nets are weight matrices of oth...
Despite successes across a broad range of applications, Transformers hav...
Recently, many datasets have been proposed to test the systematic
genera...
Transformers with linearised attention ("linear Transformers") have
demo...
We show the formal equivalence of linearised self-attention mechanisms a...
We present a complete training pipeline to build a state-of-the-art hybr...
We explore multi-layer autoregressive Transformer models in language mod...
We present state-of-the-art automatic speech recognition (ASR) systems
e...
Lingvo is a Tensorflow framework offering a complete solution for
collab...
We evaluate attention-based encoder-decoder models along two dimensions:...
Sequence-to-sequence attention-based models on subword units allow simpl...