Improved Confidence Bounds for the Linear Logistic Model and Applications to Linear Bandits
We propose improved fixed-design confidence bounds for the linear logistic model. Our bounds significantly improve upon the state-of-the-art bounds of Li et al. (2017) by leveraging the self-concordance of the logistic loss inspired by Faury et al. (2020). Specifically, our confidence width does not scale with the problem dependent parameter 1/κ, where κ is the worst-case variance of an arm reward. At worse, κ scales exponentially with the norm of the unknown linear parameter θ^*. Instead, our bound scales directly on the local variance induced by θ^*. We present two applications of our novel bounds on two logistic bandit problems: regret minimization and pure exploration. Our analysis shows that the new confidence bounds improve upon previous state-of-the-art performance guarantees.
READ FULL TEXT