Exploiting Correlation in Finite-Armed Structured Bandits

10/18/2018
by   Samarth Gupta, et al.
0

We consider a correlated multi-armed bandit problem in which rewards of arms are correlated through a hidden parameter. Our approach exploits the correlation among arms to identify some arms as sub-optimal and pulls them only O(1) times. This results in significant reduction in cumulative regret, and in fact our algorithm achieves bounded (i.e., O(1)) regret whenever possible; explicit conditions needed for bounded regret to be possible are also provided by analyzing regret lower bounds. We propose several variants of our approach that generalize classical bandit algorithms such as UCB, Thompson sampling, KL-UCB to the structured bandit setting, and empirically demonstrate their superiority via simulations.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset