Collecting and leveraging data with good coverage properties plays a cru...
We study the autonomous exploration (AX) problem proposed by Lim Aue...
In contextual linear bandits, the reward function is assumed to be a lin...
We study the problem of representation learning in stochastic contextual...
We study the sample complexity of learning an ϵ-optimal policy in
the St...
Optimistic algorithms have been extensively studied for regret minimizat...
Elimination algorithms for bandit identification, which prune the plausi...
In probably approximately correct (PAC) reinforcement learning (RL), an ...
We study the problem of the identification of m arms with largest means ...
We study the role of the representation of state-action value functions ...
We derive a novel asymptotic problem-dependent lower-bound for regret
mi...
Many real-world domains are subject to a structured non-stationarity whi...
The linear contextual bandit literature is mostly focused on the design ...
In the contextual linear bandit setting, algorithms built on the optimis...
We are interested in how to design reinforcement learning agents that
pr...
We study finite-armed stochastic bandits where the rewards of each arm m...
Traditional model-based reinforcement learning approaches learn a model ...
Mutual information has been successfully adopted in filter feature-selec...
We consider the transfer of experience samples (i.e., tuples < s, a, s',...