Cross-trait prediction accuracy of high-dimensional ridge-type estimators in genome-wide association studies
Marginal association summary statistics have attracted great attention in statistical genetics, mainly because the primary results of most genome-wide association studies (GWAS) are produced by marginal screening. In this paper, we study the prediction accuracy of marginal estimator in dense (or sparsity free) high-dimensional settings with (n,p,m) →∞, m/n →γ∈ (0,∞), and p/n →ω∈ (0,∞). We consider a general correlation structure among the p features and allow an unknown subset m of them to be signals. As the marginal estimator can be viewed as a ridge estimator with regularization parameter λ→∞, we further investigate a class of ridge-type estimators in a unifying framework, including the popular best linear unbiased prediction (BLUP) in genetics. We find that the influence of λ on out-of-sample prediction accuracy heavily depends on ω. Though selecting an optimal λ can be important when p and n are comparable, it turns out that the out-of-sample R^2 of ridge-type estimators becomes near-optimal for any λ∈ (0,∞) as ω increases. For example, when features are independent, the out-of-sample R^2 is always bounded by 1/ω from above and is largely invariant to λ given large ω (say, ω>5). We also find that in-sample R^2 has completely different patterns and depends much more on λ than out-of-sample R^2. In practice, our analysis delivers useful messages for genome-wide polygenic risk prediction and computation-accuracy trade-off in dense high-dimensions. We numerically illustrate our results in simulation studies and a real data example.
READ FULL TEXT 
  
  
     share
 share