A tree-based online search algorithm iteratively simulates trajectories ...
Thompson sampling is a well-known approach for balancing exploration and...
In the practice of sequential decision making, agents are often designed...
This paper proposes PuRL - a deep reinforcement learning (RL) based algo...