Deep neural networks are widely known for their remarkable effectiveness...
Recent work reported the label alignment property in a supervised learni...
A variety of theoretically-sound policy gradient algorithms exist for th...
Dyna-style reinforcement learning (RL) agents improve sample efficiency ...
For multi-valued functions—such as when the conditional distribution on
...
Policy gradient methods are widely used for control in reinforcement
lea...
There is growing evidence that converting targets to soft targets in
sup...