List-Decodable Linear Regression
We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any α < 1, our algorithm takes as input a sample { (x_i,y_i)}_i ≤ n of n linear equations where α n of the equations satisfy y_i = 〈 x_i,ℓ^*〉 +ζ for some small noise ζ and (1-α)n of the equations are arbitrarily chosen. It outputs a list L of size O(1/α) - a fixed constant - that contains an ℓ that is close to ℓ^*. Our algorithm succeeds whenever the inliers are chosen from a certifiably anti-concentrated distribution D. As a special case, this yields a (d/α)^O(1/α^8) time algorithm to find a O(1/α) size list when the inlier distribution is a standard Gaussian. The anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm works for more general distributions under the additional assumption that ℓ^* is Boolean valued. To solve the problem we introduce a new framework for list-decodable learning that strengthens the sum-of-squares `identifiability to algorithms' paradigm. In an independent work, Raghavendra and Yau [RY19] have obtained a similar result for list-decodable regression also using the sum-of-squares method.
READ FULL TEXT