Stochastic gradient descent (SGD) and adaptive gradient methods, such as...
Restricting the variance of a policy's return is a popular choice in
ris...
Efficient exploration is critical in cooperative deep Multi-Agent
Reinfo...
Reinforcement learning (RL) agents can leverage batches of previously
co...
Recent work reported the label alignment property in a supervised learni...
Artificial neural networks are promising as general function approximato...
Policy gradient (PG) estimators for softmax policies are ineffective wit...
Model-based reinforcement learning (MBRL) can significantly improve samp...
Q-learning suffers from overestimation bias, because it approximates the...
For multi-valued functions—such as when the conditional distribution on
...
Model-based reinforcement learning has been empirically demonstrated as ...
Representation learning is critical to the success of modern large-scale...
Dyna is an architecture for model-based reinforcement learning (RL), whe...
Value-based approaches can be difficult to use in continuous action spac...
Recent work has shown that reinforcement learning (RL) is a promising
ap...
Model-based strategies for control are critical to obtain sample efficie...
The family of temporal difference (TD) methods span a spectrum from
comp...
Balancing between computational efficiency and sample efficiency is an
i...