Towards Instance Optimal Bounds for Best Arm Identification
In the classical best arm identification (Best-1-Arm) problem, we are given n stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability at least 1-δ, using as few samples as possible. Understanding the sample complexity of Best-1-Arm has attracted significant attention since the last decade. However, the exact sample complexity of the problem is still unknown. Recently, Chen and Li made the gap-entropy conjecture concerning the instance sample complexity of Best-1-Arm. Given an instance I, let μ_[i] be the ith largest mean and Δ_[i]=μ_[1]-μ_[i] be the corresponding gap. H(I)=∑_i=2^nΔ_[i]^-2 is the complexity of the instance. The gap-entropy conjecture states that Ω(H(I)·(δ^-1+Ent(I))) is an instance lower bound, where Ent(I) is an entropy-like term determined by the gaps, and there is a δ-correct algorithm for Best-1-Arm with sample complexity O(H(I)·(δ^-1+Ent(I))+Δ_[2]^-2Δ_[2]^-1). If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-1-Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires O(H(I)·(δ^-1 +Ent(I))+Δ_[2]^-2Δ_[2]^-1polylog(n,δ^-1)) samples in expectation. For the lower bound, we show that for any Gaussian Best-1-Arm instance with gaps of the form 2^-k, any δ-correct monotone algorithm requires Ω(H(I)·(δ^-1 + Ent(I))) samples in expectation.
READ FULL TEXT