b'Jacob Hilton'

research

∙ 01/31/2023

Scaling laws for single-agent reinforcement learning

Recent work has shown that, in generative modeling, cross-entropy loss i...

0 Jacob Hilton, et al. ∙

research

∙ 10/19/2022

Scaling Laws for Reward Model Overoptimization

In reinforcement learning from human feedback, it is common to optimize ...

0 Leo Gao, et al. ∙

research

∙ 05/28/2022

Teaching Models to Express Their Uncertainty in Words

We show that a GPT-3 model can learn to express uncertainty about its ow...

0 Stephanie Lin, et al. ∙

research

∙ 03/04/2022

Training language models to follow instructions with human feedback

Making language models bigger does not inherently make them better at fo...

1 Long Ouyang, et al. ∙

research

∙ 12/17/2021

WebGPT: Browser-assisted question-answering with human feedback

We fine-tune GPT-3 to answer long-form questions using a text-based web-...

0 Reiichiro Nakano, et al. ∙

research

∙ 10/27/2021

Training Verifiers to Solve Math Word Problems

State-of-the-art language models can match human performance on many tas...

0 Karl Cobbe, et al. ∙

research

∙ 10/01/2021

Batch size-invariance for policy optimization

We say an algorithm is batch size-invariant if changes to the batch size...

21 Jacob Hilton, et al. ∙

research

∙ 09/08/2021

TruthfulQA: Measuring How Models Mimic Human Falsehoods

We propose a benchmark to measure whether a language model is truthful i...

0 Stephanie Lin, et al. ∙

research

∙ 03/29/2021

Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark

The NeurIPS 2020 Procgen Competition was designed as a centralized bench...

26 Sharada Mohanty, et al. ∙

research

∙ 09/09/2020

Phasic Policy Gradient

We introduce Phasic Policy Gradient (PPG), a reinforcement learning fram...

0 Karl Cobbe, et al. ∙

research

∙ 12/03/2019

Leveraging Procedural Generation to Benchmark Reinforcement Learning

In this report, we introduce Procgen Benchmark, a suite of 16 procedural...

0 Karl Cobbe, et al. ∙

Jacob Hilton

Featured Co-authors

Sign in with Google

Consider DeepAI Pro