Daniel Hesslow

research

∙ 06/01/2023

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Large language models are commonly trained on a mixture of filtered web ...

0 Guilherme Penedo, et al. ∙

research

∙ 10/27/2022

What Language Model to Train if You Have One Million GPU Hours?

The crystallization of modeling methods around the Transformer architect...

4 Teven Le Scao, et al. ∙

research

∙ 10/26/2022

Scaling Laws Beyond Backpropagation

Alternatives to backpropagation have long been studied to better underst...

0 Alessandro Cappelli, et al. ∙

research

∙ 05/11/2022

RITA: a Study on Scaling Up Generative Protein Sequence Models

In this work we introduce RITA: a suite of autoregressive generative mod...

12 Daniel Hesslow, et al. ∙

research

∙ 04/12/2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Large pretrained Transformer language models have been shown to exhibit ...

11 Thomas Wang, et al. ∙

research

∙ 09/24/2021

Is the Number of Trainable Parameters All That Actually Matters?

Recent work has identified simple empirical scaling laws for language mo...

0 Amélie Chatelain, et al. ∙

research

∙ 07/25/2021

LightOn Optical Processing Unit: Scaling-up AI and HPC with a Non von Neumann co-processor

We introduce LightOn's Optical Processing Unit (OPU), the first photonic...

0 Charles Brossollet, et al. ∙

research

∙ 04/29/2021

Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra

Randomized Numerical Linear Algebra (RandNLA) is a powerful class of met...

56 Daniel Hesslow, et al. ∙

research

∙ 02/08/2021

Contrastive Embeddings for Neural Architectures

The performance of algorithms for neural architecture search strongly de...

0 Daniel Hesslow, et al. ∙

Daniel Hesslow

Featured Co-authors

Sign in with Google

Consider DeepAI Pro