Robust estimation of a regression function in exponential families

11/03/2020
by   Yannick Baraud, et al.
0

We observe n pairs X_1=(W_1,Y_1),…,X_n=(W_n,Y_n) of independent random variables and assume, although this might not be true, that for each i∈{1,…,n}, the conditional distribution of Y_i given W_i belongs to a given exponential family with real parameter θ_i^⋆=θ^⋆(W_i) the value of which is a function θ^⋆ of the covariate W_i. Given a model Θ for θ^⋆, we propose an estimator θ with values in Θ the construction of which is independent of the distribution of the W_i and that possesses the properties of being robust to contamination, outliers and model misspecification. We establish non-asymptotic exponential inequalities for the upper deviations of a Hellinger-type distance between the true distribution of the data and the estimated one based on θ. Under a suitable parametrization of the exponential family, we deduce a uniform risk bound for θ over the class of Hölderian functions and we prove the optimality of this bound up to a logarithmic factor. Finally, we provide an algorithm for calculating θ when θ^⋆ is assumed to belong to functional classes of low or medium dimensions (in a suitable sense) and, on a simulation study, we compare the performance of θ to that of the MLE and median-based estimators. The proof of our main result relies on an upper bound, with explicit numerical constants, on the expectation of the supremum of an empirical process over a VC-subgraph class. This bound can be of independent interest.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset