Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

10/26/2021
โˆ™
by   Jiatai Huang, et al.
โˆ™
0
โˆ™

We consider the Scale-Free Adversarial Multi Armed Bandit (MAB) problem with unrestricted feedback delays. In contrast to the standard assumption that all losses are [0,1]-bounded, in our setting, losses can fall in a general bounded interval [-L, L], unknown to the agent before-hand. Furthermore, the feedback of each arm pull can experience arbitrary delays. We propose an algorithm named for this novel setting, which combines a recent banker online mirror descent technique and elaborately designed doubling tricks. We show that achieves ๐’ช(โˆš(K(D+T))L)ยท polylog(T, L) total regret, where T is the total number of steps and D is the total feedback delay. also outperforms existing algorithm for non-delayed (i.e., D=0) scale-free adversarial MAB problem instances. We also present a variant of for problem instances with non-negative losses (i.e., they range in [0, L] for some unknown L), achieving an ๐’ชฬƒ(โˆš(K(D+T))L) total regret, which is near-optimal compared to the ฮฉ(โˆš(KT)+โˆš(Dlog K)L) lower-bound ([Cesa-Bianchi et al., 2016]).

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset