Large language models are commonly trained on a mixture of filtered web ...
The crystallization of modeling methods around the Transformer architect...
Alternatives to backpropagation have long been studied to better underst...
In this work we introduce RITA: a suite of autoregressive generative mod...
Large pretrained Transformer language models have been shown to exhibit
...
Recent work has identified simple empirical scaling laws for language mo...
We introduce LightOn's Optical Processing Unit (OPU), the first photonic...
Randomized Numerical Linear Algebra (RandNLA) is a powerful class of met...
The performance of algorithms for neural architecture search strongly de...