Tight Memory-Regret Lower Bounds for Streaming Bandits

06/13/2023

∙

In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of Ω( (TB)^α K^1-α), α = 2^B / (2^B+1-1) for any algorithm with a time horizon T, number of arms K, and number of passes B. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known Ω(√(KT)) lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of Ω(T^1/(B+1)∑_Δ_x>0μ^*/Δ_x) for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of ϵ-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of Õ( (TB)^α K^1 - α) using constant arm memory.

READ FULL TEXT

Tight Memory-Regret Lower Bounds for Streaming Bandits

Sign in with Google

Consider DeepAI Pro