Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

10/26/2021

∙

We consider the Scale-Free Adversarial Multi Armed Bandit (MAB) problem with unrestricted feedback delays. In contrast to the standard assumption that all losses are [0,1]-bounded, in our setting, losses can fall in a general bounded interval [-L, L], unknown to the agent before-hand. Furthermore, the feedback of each arm pull can experience arbitrary delays. We propose an algorithm named for this novel setting, which combines a recent banker online mirror descent technique and elaborately designed doubling tricks. We show that achieves 𝒪(√(K(D+T))L)· polylog(T, L) total regret, where T is the total number of steps and D is the total feedback delay. also outperforms existing algorithm for non-delayed (i.e., D=0) scale-free adversarial MAB problem instances. We also present a variant of for problem instances with non-negative losses (i.e., they range in [0, L] for some unknown L), achieving an 𝒪̃(√(K(D+T))L) total regret, which is near-optimal compared to the Ω(√(KT)+√(Dlog K)L) lower-bound ([Cesa-Bianchi et al., 2016]).

READ FULL TEXT

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

Sign in with Google

Consider DeepAI Pro