Variable Selection with Second-Generation P-Values
Many statistical methods have been proposed for variable selection in the past century, but few perform this task well. The current standard bearers for variable selection include smoothly clipped absolute deviation (SCAD), adaptive lasso (AL), and minimax concave penalty with penalized linear unbiased selection (MC+). In practice, however, these algorithms often struggle to balance support recovery and parameter estimation, despite well-established oracle behavior for variable selection in certain settings. Here we report on a novel application of second-generation p-values (SGPVs) for variable selection, which we call Penalized regression with SGPVs (ProSGPV). This approach has tangible advantages in balancing support recovery and parameter estimation. The ProSGPV approach captures the true model at the best rate achieved by the current standards, is easier to implement in practice, and yields parameter estimates with the smallest mean absolute error. Even with strong collinearity in the feature space, the ProSGPV approach can maintain its good performance by using a simple pre-screening step. Here we report on extensive simulations and two real-world applications comparing these approaches. Our ProSGPV algorithm is a fast and intuitive approach for variable selection that leverages the advantages of second-generation p-values.
READ FULL TEXT