b'Tim Dettmers'

research

∙ 06/05/2023

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Recent advances in large language model (LLM) pretraining have led to hi...

0 Tim Dettmers, et al. ∙

research

∙ 05/23/2023

QLoRA: Efficient Finetuning of Quantized LLMs

We present QLoRA, an efficient finetuning approach that reduces memory u...

0 Tim Dettmers, et al. ∙

research

∙ 05/23/2023

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model

Large and sparse feed-forward networks (S-FFN) such as Mixture-of-Expert...

0 Leo Z. Liu, et al. ∙

research

∙ 04/25/2023

Stable and low-precision training for large-scale vision-language models

We introduce new methods for 1) accelerating and 2) stabilizing training...

0 Mitchell Wortsman, et al. ∙

research

∙ 01/27/2023

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

Many deep learning applications benefit from using large models with bil...

0 Max Ryabinin, et al. ∙

research

∙ 12/19/2022

The case for 4-bit precision: k-bit Inference Scaling Laws

Quantization methods reduce the number of bits required to represent eac...

0 Tim Dettmers, et al. ∙

research

∙ 09/02/2022

Petals: Collaborative Inference and Fine-tuning of Large Models

Many NLP tasks benefit from using large language models (LLMs) that ofte...

0 Alexander Borzunov, et al. ∙

research

∙ 08/15/2022

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Large language models have been widely adopted but require significant G...

0 Tim Dettmers, et al. ∙

research

∙ 08/05/2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

We present Branch-Train-Merge (BTM), a communication-efficient algorithm...

0 Margaret Li, et al. ∙

research

∙ 07/07/2022

Training Transformers Together

The infrastructure necessary for training state-of-the-art models is bec...

0 Alexander Borzunov, et al. ∙

research

∙ 10/06/2021

8-bit Optimizers via Block-wise Quantization

Stateful optimizers maintain gradient statistics over time, e.g., the ex...

0 Tim Dettmers, et al. ∙

research

∙ 03/30/2021

BASE Layers: Simplifying Training of Large, Sparse Models

We introduce a new balanced assignment of experts (BASE) layer for large...

0 Mike Lewis, et al. ∙

research

∙ 07/10/2019

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelera...

2 Tim Dettmers, et al. ∙

research

∙ 06/20/2018

Jack the Reader - A Machine Reading Framework

Many Machine Reading and Natural Language Understanding tasks require re...

0 Dirk Weissenborn, et al. ∙

research

∙ 11/14/2015

8-Bit Approximations for Parallelism in Deep Learning

The creation of practical deep learning data-products often requires par...

0 Tim Dettmers, et al. ∙

Tim Dettmers

Featured Co-authors

Sign in with Google

Consider DeepAI Pro