Fundamental Limits of Low-Rank Matrix Estimation with Diverging Aspect Ratios
We consider the problem of estimating the factors of a low-rank n × d matrix, when this is corrupted by additive Gaussian noise. A special example of our setting corresponds to clustering mixtures of Gaussians with equal (known) covariances. Simple spectral methods do not take into account the distribution of the entries of these factors and are therefore often suboptimal. Here, we characterize the asymptotics of the minimum estimation error under the assumption that the distribution of the entries is known to the statistician. Our results apply to the high-dimensional regime n, d →∞ and d / n →∞ (or d / n → 0) and generalize earlier work that focused on the proportional asymptotics n, d →∞, d / n →δ∈ (0, ∞). We outline an interesting signal strength regime in which d / n →∞ and partial recovery is possible for the left singular vectors while impossible for the right singular vectors. We illustrate the general theory by deriving consequences for Gaussian mixture clustering and carrying out a numerical study on genomics data.
READ FULL TEXT