Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

04/20/2020
by   Yong He, et al.
0

Microbial communities analysis is drawing growing attention due to the rapid development of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional (lying in a simplex) and even may be leptokurtic, which makes the conventional correlation analysis infeasible to study the co-occurrence and co-exclusion relationship between microbial taxa. In this article, we address the challenges of covariance estimation for this kind of data. Assuming the basis covariance matrix lying in a well-recognized class of sparse covariance matrices, we adopt a proxy matrix known as centered log-ratio covariance matrix, which is approximately indistinguishable from the real basis covariance matrix as the dimensionality tends to infinity. The procedure can be viewed as adaptively thresholding the Median-of-Means estimator for the centered log-ratio covariance matrix. We derive the rates of convergence under the spectral norm and element-wise ℓ_∞ norm. In addition, we also provide theoretical guarantees on support recovery. Thorough simulation studies are conducted to show the advantages of the proposed procedure over some state-of-the-arts methods. At last, we apply the proposed method to analyze a microbiome dataset.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset