In nonstationary bandit learning problems, the decision-maker must
conti...
We study the use of policy gradient algorithms to optimize over a class ...
We revisit the popular random matching market model introduced by Knuth
...
We consider a finite time horizon multi-armed bandit (MAB) problem in a
...