The conditionality principle in high-dimensional regression
Consider a high-dimensional linear regression problem, where the number of covariates is larger than the number of observations. The interest is in estimating σ^2, the conditional variance of the response variable given the covariates. Two frameworks are being considered, conditional and unconditional, where conditioning is with respect to the covariates, which are ancillary to σ^2. In recent papers, a consistent estimator was developed in the unconditional framework when the marginal distribution of the covariates is normal with known mean and variance. In this work a certain Bayesian hypothesis testing is formulated under the conditional framework, and it is shown that the Bayes risk is a constant. However, when the marginal distribution of the covariates is normal, a rule based on the above consistent estimators can be constructed such that its Bayes risk converges to zero, with probability converging to one. This means that with high probability, its Bayes risk would be smaller than the Bayes rule. It follows that even in the conditional setting, information about the marginal distribution of an ancillary statistic may have a significant impact on statistical inference. The practical implication in the context of high-dimensional regression models is that additional observations where only the covariates are given, are potentially very useful and should not be ignored.
READ FULL TEXT