De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices
Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings and for the largest eigenvalue of the population covariance matrix. Given an independent sample X^i ∈ R^p, i = 1,...,n, generated from an unknown distribution with an unknown covariance matrix Σ_0, our aim is to estimate the first vector of loadings and the largest eigenvalue of Σ_0 in a setting where p≫ n. Next to the high-dimensionality, another challenge lies in the inherent non-convexity of the problem. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. We show that our estimator achieves the minimax optimal rates in ℓ_1 and ℓ_2-norm. We identify the bias in the Lasso-based estimator and propose a de-biased sparse PCA estimator for the vector of loadings and for the largest eigenvalue of the covariance matrix Σ_0. Our main results provide theoretical guarantees for asymptotic normality of the de-biased estimator. The major conditions we impose are sparsity in the first eigenvector of small order √(n)/ p and sparsity of the same order in the columns of the inverse Hessian matrix of the population risk.
READ FULL TEXT