This paper introduces a novel backup strategy for Monte-Carlo Tree Searc...
The reproducibility of many experimental results in Deep Reinforcement
L...
We study the problem of episodic reinforcement learning in continuous
st...
In decision-making problems such as the multi-armed bandit, an agent lea...
We consider an online estimation problem involving a set of agents. Each...
In this paper, we study the stochastic bandits problem with k unknown
he...
We revisit the method of mixture technique, also known as the Laplace me...
We consider a multi-armed bandit problem specified by a set of
one-dimen...
The stochastic multi-arm bandit problem has been extensively studied und...
In this paper we propose the first multi-armed bandit algorithm based on...
Policy gradient algorithms have proven to be successful in diverse decis...
We consider a regret minimization task under the average-reward criterio...
We study a structured variant of the multi-armed bandit problem specifie...
We consider a multi-armed bandit problem specified by a set of Gaussian ...
The upper confidence reinforcement learning (UCRL2) strategy introduced ...
We develop a framework for the adaptive model predictive control of a li...
Leveraging an equivalence property in the state-space of a Markov Decisi...
We consider the setup of stochastic multi-armed bandits in the case when...
We study the problem of learning the transition matrices of a set of Mar...
We consider the problem of online planning in a Markov Decision Process ...
This work studies the design of safe control policies for large-scale
no...
The problem of reinforcement learning in an unknown and discrete Markov
...
We consider a variation on the problem of prediction with expert advice,...
We consider the problem of streaming kernel regression, when the observa...
We consider parametric exponential families of dimension K on the real
l...
We consider a non-stationary formulation of the stochastic multi-armed b...