On No-Sensing Adversarial Multi-player Multi-armed Bandits with Collision Communications

11/02/2020
by   Chengshuai Shi, et al.
5

We study the notoriously difficult no-sensing adversarial multi-player multi-armed bandits (MP-MAB) problem from a new perspective. Instead of focusing on the hardness of multiple players, we introduce a new dimension of hardness, called attackability. All adversaries can be categorized based on the attackability and we introduce Adversary-Adaptive Collision-Communication (A2C2), a family of algorithms with forced-collision communication among players. Both attackability-aware and unaware settings are studied, and information-theoretic tools of the Z-channel model and error-correction coding are utilized to address the challenge of implicit communication without collision information in an adversarial environment. For the more challenging attackability-unaware problem, we propose a simple method to estimate the attackability enabled by a novel error-detection repetition code and randomized communication for synchronization. Theoretical analysis proves that asymptotic attackability-dependent sublinear regret can be achieved, with or without knowing the attackability. In particular, the asymptotic regret does not have an exponential dependence on the number of players, revealing a fundamental tradeoff between the two dimensions of hardness in this problem.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro