Asymptotic Randomised Control with applications to bandits

10/14/2020
by   Samuel N. Cohen, et al.
0

We consider a general multi-armed bandit problem with correlated (and simple contextual and restless) elements, as a relaxed control problem. By introducing an entropy premium, we obtain a smooth asymptotic approximation to the value function. This yields a novel semi-index approximation of the optimal decision process, obtained numerically by solving a fixed point problem, which can be interpreted as explicitly balancing an exploration-exploitation trade-off. Performance of the resulting Asymptotic Randomised Control (ARC) algorithm compares favourably with other approaches to correlated multi-armed bandits.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset