Recent work has shown that, in generative modeling, cross-entropy loss
i...
In reinforcement learning from human feedback, it is common to optimize
...
We show that a GPT-3 model can learn to express uncertainty about its ow...
Making language models bigger does not inherently make them better at
fo...
We fine-tune GPT-3 to answer long-form questions using a text-based
web-...
State-of-the-art language models can match human performance on many tas...
We say an algorithm is batch size-invariant if changes to the batch size...
We propose a benchmark to measure whether a language model is truthful i...
The NeurIPS 2020 Procgen Competition was designed as a centralized bench...
We introduce Phasic Policy Gradient (PPG), a reinforcement learning fram...
In this report, we introduce Procgen Benchmark, a suite of 16 procedural...