On the Optimal Weighted ā„“_2 Regularization in Overparameterized Linear Regression

06/10/2020
āˆ™
by   Denny Wu, et al.
āˆ™
0
āˆ™

We consider the linear model š² = š—Ī²_⋆ + ϵ with š—āˆˆā„^nƗ p in the overparameterized regime p>n. We estimate β_⋆ via generalized (weighted) ridge regression: β̂_Ī» = (š—^Tš— + λΣ_w)^ā€ š—^Tš², where Ī£_w is the weighting matrix. Assuming a random effects model with general data covariance Ī£_x and anisotropic prior on the true coefficients β_⋆, i.e., š”¼Ī²_⋆β_⋆^T = Ī£_β, we provide an exact characterization of the prediction risk š”¼(y-š±^Tβ̂_Ī»)^2 in the proportional asymptotic limit p/nā†’Ī³āˆˆ (1,āˆž). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting Ī»_ opt for the ridge parameter Ī» and confirm the implicit ā„“_2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that Ī»_ opt can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when š— and β_⋆ are non-isotropic. Finally, we determine the optimal Ī£_w for both the ridgeless (λ→ 0) and optimally regularized (Ī» = Ī»_ opt) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro