Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLM-TSL and GLM-FPL. GLM-TSL samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. GLM-FPL, a new algorithm proposed in this work, fits a GLM to a randomly perturbed history of past rewards. We prove a Õ(d √(n) + d^2) upper bound on the n-round regret of GLM-TSL, where d is the number of features. This is the first regret bound of a Thompson sampling-like algorithm in GLM bandits where the leading term is Õ(d √(n)). We apply both GLM-TSL and GLM-FPL to logistic and neural network bandits, and show that they perform well empirically. In more complex models, GLM-FPL is significantly faster. Our results showcase the role of randomization, beyond posterior sampling, in exploration.
READ FULL TEXT