Collaborative Regret Minimization in Multi-Armed Bandits

01/26/2023

∙

In this paper, we study the collaborative learning model, which concerns the tradeoff between parallelism and communication overhead in multi-agent reinforcement learning. For a fundamental problem in bandit theory, regret minimization in multi-armed bandits, we present the first and almost tight tradeoffs between the number of rounds of communication between the agents and the regret of the collaborative learning process.

READ FULL TEXT

Collaborative Regret Minimization in Multi-Armed Bandits

Sign in with Google

Consider DeepAI Pro