Large language models (LLMs) are routinely pre-trained on billions of to...
Memorization, or the tendency of large language models (LLMs) to output
...
How do large language models (LLMs) develop and evolve over the course o...
In recent years, the training requirements of many state-of-the-art Deep...
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive languag...
Understanding and visualizing the full-stack performance trade-offs and
...
The enormous amount of data and computation required to train DNNs have ...