On the generalization error of norm penalty linear regression models
We study linear regression problems inf_β∈ℝ^d(𝔼_ℙ_n[|Y - 𝐗^⊤β|^r])^1/r + δρ(β), with r≥ 1, convex penalty ρ, and empirical measure of the data ℙ_n. Well known examples include the square-root lasso, square-root sorted ℓ_1 penalization, and penalized least absolute deviations regression. We show that, under benign regularity assumptions on ρ, such procedures naturally provide robust generalization, as the problem can be reformulated as a distributionally robust optimization (DRO) problem for a type of max-sliced Wasserstein ball B_δ^ρ(ℙ_n), i.e. β solves the linear regression problem iff it solves inf_β∈ℝ^dsup_ℚ∈ B^ρ_δ(ℙ_n)𝔼_ℚ[|Y - 𝐗^⊤β|^r]. Our proof of this result is constructive: it identifies the worst-case measure in the DRO problem, which is given by an additive perturbation of ℙ_n. We argue that B_δ^ρ(ℙ_n) are the natural balls to consider in this framework, as they provide a computationally efficient procedure comparable to non-robust methods and optimal robustness guarantees. In fact, our generalization bounds are of order d/n, up to logarithmic factors, and thus do not suffer from the curse of dimensionality as is the case for known generalization bounds when using the Wasserstein metric on ℝ^d. Moreover, the bounds provide theoretical support for recommending a regularization parameter δ of the same order for the linear regression problem.
READ FULL TEXT