Fast and robust model selection based on ranks
We consider the problem of identifying important predictors in large data bases, where the relationship between the response variable and the explanatory variables is specified by the general single index model, with unknown link function and unknown distribution of the error term. We utilize the natural robust and efficient approach, which relies on replacing values of the response variable with their ranks and then identifying important predictors by using the well known LASSO. The resulting RankLasso coincides with the previously proposed distribution-based LASSO, where the relationship with the rank approach was not realized. We refine the consistency results for RankLasso provided in the earlier papers and extend the scope of applications of this method by proposing its thresholded and adaptive versions. We present theoretical results which show that similarly as in the context of regular LASSO, the proposed modifications are model selection consistent under much weaker assumptions than RankLasso. These theoretical results are illustrated by extensive simulation study, which shows that the proposed procedures are indeed much more efficient than the vanilla version of RankLasso and that they can properly identify relevant predictors, even if the error terms come from the Cauchy distribution. The simulation study shows also that concerning model selection RankLasso performs substantially better than LADLasso, which is a well established methodology for robust model selection.
READ FULL TEXT