No-Regret Linear Bandits beyond Realizability

02/26/2023
by   Chong Liu, et al.
0

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter ϵ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever ϵ > 0. We describe a more natural model of misspecification which only requires the approximation error at each input x to be proportional to the suboptimality gap at x. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm – designed for the realizable case – is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal √(T) regret for problems that the best-known regret is almost linear in time horizon T. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset