Bayesian Sparse Gaussian Mixture Model in High Dimensions
We establish the minimax risk for parameter estimation in sparse high-dimensional Gaussian mixture models and show that a constrained maximum likelihood estimator (MLE) achieves the minimax optimality. However, the optimization-based constrained MLE is computationally intractable due to non-convexity of the problem. Therefore, we propose a Bayesian approach to estimate high-dimensional Gaussian mixtures whose cluster centers exhibit sparsity using a continuous spike-and-slab prior, and prove that the posterior contraction rate of the proposed Bayesian method is minimax optimal. The mis-clustering rate is obtained as a by-product using tools from matrix perturbation theory. Computationally, posterior inference of the proposed Bayesian method can be implemented via an efficient Gibbs sampler with data augmentation, circumventing the challenging frequentist nonconvex optimization-based algorithms. The proposed Bayesian sparse Gaussian mixture model does not require pre-specifying the number of clusters, which is allowed to grow with the sample size and can be adaptively estimated via posterior inference. The validity and usefulness of the proposed method is demonstrated through simulation studies and the analysis of a real-world single-cell RNA sequencing dataset.
READ FULL TEXT