Near Optimal Adversarial Attack on UCB Bandits
We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm T - o(T) times with a cumulative cost that scales as √(log T), where T is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to loglog T factors, showing our attack to be near optimal.
READ FULL TEXT