With the growing needs of online A/B testing to support the innovation i...
Despite the great interest in the bandit problem, designing efficient
al...
Batch reinforcement learning (RL) aims at finding an optimal policy in a...
In the new era of personalization, learning the heterogeneous treatment
...
Latent factor model estimation typically relies on either using domain
k...
High-quality data plays a central role in ensuring the accuracy of polic...
Online learning in large-scale structured bandits is known to be challen...
The two-sided markets such as ride-sharing companies often involve a gro...
How to explore efficiently is a central problem in multi-armed bandits. ...
Order dispatch is one of the central problems to ride-sharing platforms....
Off-policy evaluation learns a target policy's value with a historical
d...
Severe infectious diseases such as the novel coronavirus (COVID-19) pose...
The Markov assumption (MA) is fundamental to the empirical validity of
r...