Randomized Exploration in Generalized Linear Bandits

06/21/2019
by   Branislav Kveton, et al.
2

We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. GLM-FPL, a new algorithm proposed in this work, fits a GLM to a randomly perturbed history of past rewards. We prove a Õ(d √(n) + d^2) upper bound on the n-round regret of GLM-TSL, where d is the number of features. This is the first regret bound of a Thompson sampling-like algorithm in GLM bandits where the leading term is Õ(d √(n)). We apply both GLM-TSL and GLM-FPL to logistic and neural network bandits, and show that they perform well empirically. In more complex models, GLM-FPL is significantly faster. Our results showcase the role of randomization, beyond posterior sampling, in exploration.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset