Scaling Transformers to longer sequence lengths has been a major problem...
The BigCode community, an open-scientific collaboration working on the
r...
Time series modeling is a well-established problem, which often requires...
Recent advances in deep learning have relied heavily on the use of large...
State space models (SSMs) have high performance on long sequence modelin...
State space models (SSMs) have demonstrated state-of-the-art sequence
mo...
Spectral analysis provides one of the most effective paradigms for
infor...
Visual data such as images and videos are typically modeled as
discretiz...
Normalizing flows model complex probability distributions using maps obt...
Communication compression is a crucial technique for modern distributed
...
Training foundation models, such as GPT-3 and PaLM, can be extremely
exp...
Transformers are slow and memory-hungry on long sequences, since the tim...
Overparameterized neural networks generalize well but are expensive to t...
Recent advances in efficient Transformers have exploited either the spar...
Recurrent neural networks (RNNs), temporal convolutions, and neural
diff...
A popular approach to model compression is to train an inexpensive stude...
An important goal of neural architecture search (NAS) is to automate-awa...
Modern neural network architectures use structured linear transformation...
A central problem in learning from sequential data is representing cumul...
Computing the permanent of a non-negative matrix is a core problem with
...
Compressing word embeddings is important for deploying NLP models in
mem...
Fast linear transforms are ubiquitous in machine learning, including the...
We investigate how to train kernel approximation methods that generalize...
The low displacement rank (LDR) framework for structured matrices repres...
Data augmentation, a technique in which a training set is expanded with
...