Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme
Parallel tempering (PT) methods are a popular class of Markov chain Monte Carlo schemes used to explore complex high-dimensional probability distributions. These algorithms can be highly effective but their performance is contingent on the selection of a suitable annealing schedule. In this work, we provide a new perspective on PT algorithms and their tuning, based on two main insights. First, we identify and formalize a sharp divide in the behaviour and performance of reversible versus non-reversible PT methods. Second, we analyze the behaviour of PT algorithms using a novel asymptotic regime in which the number of parallel compute cores goes to infinity. Based on this approach we show that a class of non-reversible PT methods dominates its reversible counterpart and identify distinct scaling limits for the non-reversible and reversible schemes, the former being a piecewise-deterministic Markov process (PDMP) and the latter a diffusion. In particular, we identify a class of non-reversible PT algorithms which is provably scalable to massive parallel implementation, in contrast to reversible PT algorithms, which are known to collapse in the massive parallel regime. We then bring these theoretical tools to bear on the development of novel methodologies. We develop an adaptive non-reversible PT scheme which estimates the event rate of the limiting PDMP and uses this estimated rate to approximate the optimal annealing schedule. We provide a wide range of numerical examples supporting and extending our theoretical and methodological contributions. Our adaptive non-reversible PT method outperforms experimentally state-of-the-art PT methods in terms of taking less time to adapt, as well as providing better target approximations. Our scheme has no tuning parameters and appears in our simulations robust to violations of the theoretical assumption used to carry out our analysis.
READ FULL TEXT