Contextual bandits are widely used in industrial personalization systems...
Empirical risk minimization (ERM) is the workhorse of machine learning,
...
Contextual bandit algorithms are increasingly replacing non-adaptive A/B...
To balance exploration and exploitation, multi-armed bandit algorithms n...
We consider adaptive designs for a trial involving N individuals that we...
We design a new family of estimators for off-policy evaluation in contex...
Contextual bandit algorithms are sensitive to the estimation method of t...
We consider a team of reinforcement learning agents that concurrently op...
We consider a team of reinforcement learning agents that concurrently le...
Contextual bandit algorithms seek to learn a personalized treatment
assi...