Fundamental Barriers to High-Dimensional Regression with Convex Penalties

03/25/2019
by   Michael Celentano, et al.
0

In high-dimensional regression, we attempt to estimate a parameter vector β_0∈ R^p from n≲ p observations {(y_i, x_i)}_i< n where x_i∈ R^p is a vector of predictors and y_i is a response variable. A well-estabilished approach uses convex regularizers to promote specific structures (e.g. sparsity) of the estimate β, while allowing for practical algorithms. Theoretical analysis implies that convex penalization schemes have nearly optimal estimation properties in certain settings. However, in general the gaps between statistically optimal estimation (with unbounded computational resources) and convex methods are poorly understood. We show that, in general, a large gap exists between the best performance achieved by any convex regularizer and the optimal statistical error. Remarkably, we demonstrate that this gap is generic as soon as we try to incorporate very simple structural information about the empirical distribution of the entries of β_0. Our results follow from a detailed study of standard Gaussian designs, a setting that is normally considered particularly friendly to convex regularization schemes such as the Lasso. We prove a lower bound on the estimation error achieved by any convex regularizer which is invariant under permutations of the coordinates of its argument. This bound is expected to be generally tight, and indeed we prove tightness under certain conditions. Further, it implies a gap with respect to Bayes-optimal estimation that can be precisely quantified and persists if the prior distribution of the signal β_0 is known to the statistician. Our results provide rigorous evidence towards a broad conjecture regarding computational-statistical gaps in high-dimensional estimation.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset