Distributed Learning for Principle Eigenspaces without Moment Constraints

04/29/2022
by   Yong He, et al.
0

Distributed Principal Component Analysis (PCA) has been studied to deal with the case when data are stored across multiple machines and communication cost or privacy concerns prohibit the computation of PCA in a central location. However, the sub-Gaussian assumption in the related literature is restrictive in real application where outliers or heavy-tailed data are common in areas such as finance and macroeconomic. In this article, we propose a distributed algorithm for estimating the principle eigenspaces without any moment constraint on the underlying distribution. We study the problem under the elliptical family framework and adopt the sample multivariate Kendall'tau matrix to extract eigenspace estimators from all sub-machines, which can be viewed as points in the Grassman manifold. We then find the "center" of these points as the final distributed estimator of the principal eigenspace. We investigate the bias and variance for the distributed estimator and derive its convergence rate which depends on the effective rank and eigengap of the scatter matrix, and the number of submachines. We show that the distributed estimator performs as if we have full access of whole data. Simulation studies show that the distributed algorithm performs comparably with the existing one for light-tailed data, while showing great advantage for heavy-tailed data. We also extend our algorithm to the distributed learning of elliptical factor models and verify its empirical usefulness through real application to a macroeconomic dataset.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset