Nearly Minimax Algorithms for Linear Bandits with Shared Representation
We give novel algorithms for multi-task and lifelong linear bandits with shared representation. Specifically, we consider the setting where we play M linear bandits with dimension d, each for T rounds, and these M bandit tasks share a common k(≪ d) dimensional linear representation. For both the multi-task setting where we play the tasks concurrently, and the lifelong setting where we play tasks sequentially, we come up with novel algorithms that achieve O(d√(kMT) + kM√(T)) regret bounds, which matches the known minimax regret lower bound up to logarithmic factors and closes the gap in existing results [Yang et al., 2021]. Our main technique include a more efficient estimator for the low-rank linear feature extractor and an accompanied novel analysis for this estimator.
READ FULL TEXT