Collaborative Regret Minimization in Multi-Armed Bandits

01/26/2023
by   Nikolai Karpov, et al.
0

In this paper, we study the collaborative learning model, which concerns the tradeoff between parallelism and communication overhead in multi-agent reinforcement learning. For a fundamental problem in bandit theory, regret minimization in multi-armed bandits, we present the first and almost tight tradeoffs between the number of rounds of communication between the agents and the regret of the collaborative learning process.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset