Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods
The problem of Non-Gaussian Component Analysis (NGCA) is about finding a maximal low-dimensional subspace E in R^n so that data points projected onto E follow a non-gaussian distribution. Although this is an appropriate model for some real world data analysis problems, there has been little progress on this problem over the last decade. In this paper, we attempt to address this state of affairs in two ways. First, we give a new characterization of standard gaussian distributions in high-dimensions, which lead to effective tests for non-gaussianness. Second, we propose a simple algorithm, Reweighted PCA, as a method for solving the NGCA problem. We prove that for a general unknown non-gaussian distribution, this algorithm recovers at least one direction in E, with sample and time complexity depending polynomially on the dimension of the ambient space. We conjecture that the algorithm actually recovers the entire E.
READ FULL TEXT