On the Optimal Weighted ā_2 Regularization in Overparameterized Linear Regression
We consider the linear model š² = šĪ²_ā + ϵ with šāā^nĆ p in the overparameterized regime p>n. We estimate β_ā via generalized (weighted) ridge regression: βĢ_Ī» = (š^Tš + λΣ_w)^ā š^Tš², where Ī£_w is the weighting matrix. Assuming a random effects model with general data covariance Ī£_x and anisotropic prior on the true coefficients β_ā, i.e., š¼Ī²_āβ_ā^T = Ī£_β, we provide an exact characterization of the prediction risk š¼(y-š±^TβĢ_Ī»)^2 in the proportional asymptotic limit p/nāγā (1,ā). Our general setup leads to a number of interesting findings. We outline precise conditions that decide the sign of the optimal setting Ī»_ opt for the ridge parameter Ī» and confirm the implicit ā_2 regularization effect of overparameterization, which theoretically justifies the surprising empirical observation that Ī»_ opt can be negative in the overparameterized regime. We also characterize the double descent phenomenon for principal component regression (PCR) when š and β_ā are non-isotropic. Finally, we determine the optimal Ī£_w for both the ridgeless (Ī»ā 0) and optimally regularized (Ī» = Ī»_ opt) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.
READ FULL TEXT