Grouped Heterogeneous Mixture Modeling for Clustered Data
Clustered data which has a grouping structure (e.g. postal area, school, individual, species) appears in a variety of scientific fields. The goal of statistical analysis of clustered data is modeling the response as a function of covariates while accounting for heterogeneity among clusters. For this purpose, we consider estimating cluster-wise conditional distributions by mixtures of latent conditional distributions common to all the clusters with cluster-wise different mixing proportions. For modeling the mixing proportions, we propose a structure that clusters are divided into finite number of groups and mixing proportions are assumed to be the same within the same group. The proposed model is interpretable and the maximum likelihood estimator is easy to compute via the generalized EM algorithm. In the setting where the cluster sizes grows with, but much more slowly than, the number of clusters, some asymptotic properties of the maximum likelihood estimator are presented. Furthermore, we propose an information criterion for selecting two tuning parameters, number of groups and latent conditional distributions. Numerical studies demonstrate that the proposed model outperforms some other existing methods.
READ FULL TEXT