Robust selection of predictors and conditional outlier detection in a perturbed large-dimensional regression context

04/25/2021
by   Matteo Farnè, et al.
0

This paper presents a fast methodology, called ROBOUT, to identify outliers in a response variable conditional on a set of linearly related predictors, retrieved from a large granular dataset. ROBOUT is shown to be effective and particularly versatile compared to existing methods in the presence of a number of data idiosyncratic features. ROBOUT is able to identify observations with outlying conditional variance when the dataset contains element-wise sparse variables, and the set of predictors contains multivariate outliers. Existing integrated methodologies like SPARSE-LTS and RLARS are systematically sub-optimal under those conditions. ROBOUT entails a robust selection stage of the statistically relevant predictors (by using a Huber or a quantile loss), the estimation of a robust regression model based on the selected predictors (by LTS, GS or MM), and a criterion to identify conditional outliers based on a robust measure of the residuals' dispersion. We conduct a comprehensive simulation study in which the different variants of the proposed algorithm are tested under an exhaustive set of different perturbation scenarios. The methodology is also applied to a granular supervisory banking dataset collected by the European Central Bank.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro