b'Tri Dao'

research

∙ 07/17/2023

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Scaling Transformers to longer sequence lengths has been a major problem...

0 Tri Dao, et al. ∙

research

∙ 05/09/2023

StarCoder: may the source be with you!

The BigCode community, an open-scientific collaboration working on the r...

12 Raymond Li, et al. ∙

research

∙ 03/16/2023

Effectively Modeling Time Series with Simple Discrete State Spaces

Time series modeling is a well-established problem, which often requires...

0 Michael Zhang, et al. ∙

research

∙ 02/21/2023

Hyena Hierarchy: Towards Larger Convolutional Language Models

Recent advances in deep learning have relied heavily on the use of large...

7 Michael Poli, et al. ∙

research

∙ 02/13/2023

Simple Hardware-Efficient Long Convolutions for Sequence Modeling

State space models (SSMs) have high performance on long sequence modelin...

8 Daniel Y. Fu, et al. ∙

research

∙ 12/28/2022

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

State space models (SSMs) have demonstrated state-of-the-art sequence mo...

1 Tri Dao, et al. ∙

research

∙ 11/26/2022

Transform Once: Efficient Operator Learning in Frequency Domain

Spectral analysis provides one of the most effective paradigms for infor...

9 Michael Poli, et al. ∙

research

∙ 10/12/2022

S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces

Visual data such as images and videos are typically modeled as discretiz...

9 Eric Nguyen, et al. ∙

research

∙ 09/28/2022

ButterflyFlow: Building Invertible Layers with Butterfly Matrices

Normalizing flows model complex probability distributions using maps obt...

9 Chenlin Meng, et al. ∙

research

∙ 06/02/2022

Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees

Communication compression is a crucial technique for modern distributed ...

1 Jue Wang, et al. ∙

research

∙ 06/02/2022

Decentralized Training of Foundation Models in Heterogeneous Environments

Training foundation models, such as GPT-3 and PaLM, can be extremely exp...

8 Binhang Yuan, et al. ∙

research

∙ 05/27/2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Transformers are slow and memory-hungry on long sequences, since the tim...

10 Tri Dao, et al. ∙

research

∙ 11/30/2021

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Overparameterized neural networks generalize well but are expensive to t...

4 Beidi Chen, et al. ∙

research

∙ 10/28/2021

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation

Recent advances in efficient Transformers have exploited either the spar...

5 Beidi Chen, et al. ∙

research

∙ 10/26/2021

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

Recurrent neural networks (RNNs), temporal convolutions, and neural diff...

25 Albert Gu, et al. ∙

research

∙ 04/20/2021

Knowledge Distillation as Semiparametric Inference

A popular approach to model compression is to train an inexpensive stude...

0 Tri Dao, et al. ∙

research

∙ 03/29/2021

Rethinking Neural Operations for Diverse Tasks

An important goal of neural architecture search (NAS) is to automate-awa...

7 Nicholas Roberts, et al. ∙

research

∙ 12/29/2020

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

Modern neural network architectures use structured linear transformation...

6 Tri Dao, et al. ∙

research

∙ 08/17/2020

HiPPO: Recurrent Memory with Optimal Polynomial Projections

A central problem in learning from sequential data is representing cumul...

24 Albert Gu, et al. ∙

research

∙ 11/26/2019

Approximating the Permanent by Sampling from Adaptive Partitions

Computing the permanent of a non-negative matrix is a core problem with ...

12 Jonathan Kuck, et al. ∙

research

∙ 09/03/2019

On the Downstream Performance of Compressed Word Embeddings

Compressing word embeddings is important for deploying NLP models in mem...

5 Avner May, et al. ∙

research

∙ 03/14/2019

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

Fast linear transforms are ubiquitous in machine learning, including the...

28 Tri Dao, et al. ∙

research

∙ 10/31/2018

Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

We investigate how to train kernel approximation methods that generalize...

4 Jian Zhang, et al. ∙

research

∙ 10/04/2018

Learning Compressed Transforms with Low Displacement Rank

The low displacement rank (LDR) framework for structured matrices repres...

2 Anna T. Thomas, et al. ∙

research

∙ 03/16/2018

A Kernel Theory of Modern Data Augmentation

Data augmentation, a technique in which a training set is expanded with ...

0 Tri Dao, et al. ∙

Tri Dao

Featured Co-authors

Sign in with Google

Consider DeepAI Pro