A Distributed Algorithm for Polya-Gamma Data Augmentation
The Polya-Gamma data augmentation (PG-DA) algorithm is routinely used for Bayesian inference in logistic models. This algorithm has broad applications and outperforms other sampling algorithms in terms of numerical stability and ease of implementation. The Markov chain produced by the PG-DA algorithm is also known to be uniformly ergodic; however, the PG-DA algorithm is prohibitively slow in massive data settings because it requires passing through the whole data at every iteration. We develop a simple distributed extension of the PG-DA strategy using the divide-and-conquer technique that divides the data into sufficiently large number of subsets, performs PG-type data augmentation in parallel using a powered likelihood, and produces Monte Carlo draws of the parameter by combining Markov chain Monte Carlo (MCMC) draws of parameter obtained from each subset. The combined parameter draws play the role of MCMC draws from the PG-DA algorithm in posterior inference. Our main contributions are three-fold. First, we develop the modified PG-DA algorithm with a powered likelihood in logistic models that is used on the subsets to obtain subset MCMC draws. Second, we modify the existing class of combination algorithms by introducing a scaling step. Finally, we demonstrate through diverse simulated and real data analyses that our distributed algorithm outperforms its competitors in terms of statistical accuracy and computational efficiency. We also provide theoretical support for our empirical observations.
READ FULL TEXT