Robust Gaussian Covariance Estimation in Nearly-Matrix Multiplication Time
Robust covariance estimation is the following, well-studied problem in high dimensional statistics: given N samples from a d-dimensional Gaussian 𝒩(0, Σ), but where an ε-fraction of the samples have been arbitrarily corrupted, output Σ minimizing the total variation distance between 𝒩(0, Σ) and 𝒩(0, Σ). This corresponds to learning Σ in a natural affine-invariant variant of the Frobenius norm known as the Mahalanobis norm. Previous work of Cheng et al demonstrated an algorithm that, given N = Ω (d^2 / ε^2) samples, achieved a near-optimal error of O(εlog 1 / ε), and moreover, their algorithm ran in time O(T(N, d) logκ / poly (ε)), where T(N, d) is the time it takes to multiply a d × N matrix by its transpose, and κ is the condition number of Σ. When ε is relatively small, their polynomial dependence on 1/ε in the runtime is prohibitively large. In this paper, we demonstrate a novel algorithm that achieves the same statistical guarantees, but which runs in time O (T(N, d) logκ). In particular, our runtime has no dependence on ε. When Σ is reasonably conditioned, our runtime matches that of the fastest algorithm for covariance estimation without outliers, up to poly-logarithmic factors, showing that we can get robustness essentially "for free."
READ FULL TEXT