To use reinforcement learning from human feedback (RLHF) in practical
ap...
Reinforcement learning from human feedback (RLHF) is a technique for tra...
We propose Convex Constraint Learning for Reinforcement Learning (CoCoRL...
Interpretability research aims to build tools for understanding machine
...
Stable Diffusion is a recent open-source image generation model comparab...
Inverse Reinforcement Learning (IRL) is a powerful paradigm for inferrin...
Reinforcement learning (RL) commonly assumes access to well-specified re...
We study sequential decision-making with known rewards and unknown
const...
Learning optimal control policies directly on physical systems is challe...
Machine Learning (ML) increasingly informs the allocation of opportuniti...
Since reward functions are hard to specify, recent work has focused on
l...
For many reinforcement learning (RL) applications, specifying a reward i...
Designing reward functions for reinforcement learning is difficult: besi...
Current reinforcement learning methods fail if the reward function is
im...
The ability to track and monitor relevant and important news in real-tim...