Thompson Sampling on Asymmetric α-Stable Bandits

03/19/2022
by   Zhendong Shi, et al.
0

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric α-stable distributions and explore their applications in modelling financial and wireless data.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset