Large transformer models trained on diverse datasets have shown a remark...
In many bandit problems, the maximal reward achievable by a policy is of...
In offline reinforcement learning (RL), a learner leverages prior logged...
We study the problem of model selection in batch policy optimization: gi...
Deep reinforcement learning has achieved impressive successes yet often
...
Maximum a posteriori (MAP) inference in discrete-valued Markov random fi...
Maximum a posteriori (MAP) inference is a fundamental computational para...