Feature-specific inference for penalized regression using local false discovery rates
Penalized regression methods, most notably the lasso, are a popular approach to analyzing high-dimensional data. An attractive property of the lasso is that it naturally performs variable selection. An important area of concern, however, is the reliability of these variable selections. Motivated by local false discovery rate methodology from the large-scale hypothesis testing literature, we propose a method for calculating a local false discovery rate for each variable under consideration by the lasso model. These rates can be used to assess the reliability of an individual feature, or to estimate the model's overall false discovery rate. The method can be used for all values of λ. This is particularly useful for models with a few highly significant features but a high overall Fdr, which are a relatively common occurrence when using cross validation to select λ. It is also flexible enough to be applied to many varieties of penalized likelihoods including GLM and Cox models, and a variety of penalties, including MCP and SCAD. We demonstrate the validity of this approach and contrast it with other inferential methods for penalized regression as well as with local false discovery rates for univariate hypothesis tests. Finally, we show the practical utility of our method by applying it to two case studies involving high dimensional genetic data.
READ FULL TEXT