Recent advances in large language model (LLM) pretraining have led to
hi...
We present QLoRA, an efficient finetuning approach that reduces memory u...
Large and sparse feed-forward networks (S-FFN) such as Mixture-of-Expert...
We introduce new methods for 1) accelerating and 2) stabilizing training...
Many deep learning applications benefit from using large models with bil...
Quantization methods reduce the number of bits required to represent eac...
Many NLP tasks benefit from using large language models (LLMs) that ofte...
Large language models have been widely adopted but require significant G...
We present Branch-Train-Merge (BTM), a communication-efficient algorithm...
The infrastructure necessary for training state-of-the-art models is bec...
Stateful optimizers maintain gradient statistics over time, e.g., the
ex...
We introduce a new balanced assignment of experts (BASE) layer for large...
We demonstrate the possibility of what we call sparse learning: accelera...
Many Machine Reading and Natural Language Understanding tasks require re...
The creation of practical deep learning data-products often requires
par...