Memory-Sample Tradeoffs for Linear Regression with Small Error
We consider the problem of performing linear regression over a stream of d-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples (a_1,b_1), (a_2,b_2)..., with a_i drawn independently from a d-dimensional isotropic Gaussian, and where b_i = 〈 a_i, x〉 + η_i, for a fixed x ∈R^d with x_2 = 1 and with independent noise η_i drawn uniformly from the interval [-2^-d/5,2^-d/5]. We show that any algorithm with at most d^2/4 bits of memory requires at least Ω(d 1/ϵ) samples to approximate x to ℓ_2 error ϵ with probability of success at least 2/3, for ϵ sufficiently small as a function of d. In contrast, for such ϵ, x can be recovered to error ϵ with probability 1-o(1) with memory O(d^2 (1/ϵ)) using d examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.
READ FULL TEXT