Statistical inference with anchored Bayesian mixture of regressions models: A case study analysis of allometric data
We present a case study in which we use a mixture of regressions model to improve on an ill-fitting simple linear regression model relating log brain mass to log body mass for 100 placental mammalian species. The slope of this regression model is of particular scientific interest because it corresponds to a constant that governs a hypothesized allometric power law relating brain mass to body mass. A specific line of investigation is to determine whether the regression parameters vary across subgroups of related species. We model these data using an anchored Bayesian mixture of regressions model, which modifies the standard Bayesian Gaussian mixture by pre-assigning small subsets of observations to given mixture components with probability one. These observations (called anchor points) break the relabeling invariance typical of exchangeable model specifications (the so-called label-switching problem). A careful choice of which observations to pre-classify to which mixture components is key to the specification of a well-fitting anchor model. In the article we compare three strategies for the selection of anchor points. The first assumes that the underlying mixture of regressions model holds and assigns anchor points to different components to maximize the information about their labeling. The second makes no assumption about the relationship between x and y and instead identifies anchor points using a bivariate Gaussian mixture model. The third strategy begins with the assumption that there is only one mixture regression component and identifies anchor points that are representative of a clustering structure based on case-deletion importance sampling weights. We compare the performance of the three strategies on the allometric data set and use auxiliary taxonomic information about the species to evaluate the model-based classifications estimated from these models.
READ FULL TEXT