Max Nadeau | DeepAI

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Xin Chen
162 publications
Dorsa Sadigh
102 publications
Dylan Hadfield-Menell
38 publications
Anca Dragan
33 publications
Gabriel Kreiman
32 publications
David Bau
31 publications
David Krueger
31 publications
Erdem Bıyık
27 publications
Tomasz Korbak
18 publications
Peter Hase
17 publications
Thomas Krendl Gilbert
15 publications

research

∙ 09/12/2023

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

Language models often exhibit behaviors that improve performance on a pr...

0 Maximilian Li, et al. ∙

research

∙ 08/29/2023

Measurement Tampering Detection Benchmark

When training powerful AI systems to perform complex tasks, it may be ch...

0 Fabien Roger, et al. ∙

research

∙ 07/27/2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Reinforcement learning from human feedback (RLHF) is a technique for tra...

0 Stephen Casper, et al. ∙

research

∙ 07/07/2023

Discovering Variable Binding Circuitry with Desiderata

Recent work has shown that computation in language models may be human-u...

0 Xander Davies, et al. ∙

research

∙ 10/07/2021

One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features

It is well understood that modern deep networks are vulnerable to advers...

11 Stephen Casper, et al. ∙

Success!

An error occurred