Dirichlet-tree multinomial mixtures for clustering microbiome compositions
A common routine in microbiome research is to identify reproducible patterns in the population through unsupervised clustering of samples. To this end, we introduce Dirichlet-tree multinomial mixtures (DTMM) as a generative model for the amplicon sequencing data in microbiome studies. DTMM models the microbiome population with Dirichlet process mixtures to learn a clustering structure. For the mixing kernels, DTMM directly utilizes a phylogenetic tree to perform a tree-based decomposition of the Dirichlet distribution. Through this decomposition, DTMM offers a flexible covariance structure to capture the large within-cluster variations, while providing a way of borrowing information among samples in different clusters to accurately learn the common part of the clusters. We perform extensive simulation studies to evaluate the performance of DTMM and compare it to several model-based and distance-based clustering methods in the microbiome context. Finally, we analyze a specific version of the fecal data in the American Gut project to identify underlying clusters of the microbiota of IBD and diabetes patients. Our analysis shows that (i) clusters in the human gut microbiome are generally determined by a large number of OTUs jointly in a sophisticated manner; (ii) OTUs from genera Bacteroides, Prevotella and Ruminococcus are typically among the important OTUs in identifying clusters; (iii) the number of clusters and the OTUs that characterize each cluster can differ across different patient groups.
READ FULL TEXT