Pointwise adaptation via stagewise aggregation of local estimates for multiclass classification
We consider a problem of multiclass classification, where the training sample S_n = {(X_i, Y_i)}_i=1^n is generated from the model p(Y = m | X = x) = θ_m(x), 1 ≤ m ≤ M, and θ_1(x), ..., θ_M(x) are unknown Lipschitz functions. Given a test point X, our goal is to estimate θ_1(X), ..., θ_M(X). An approach based on nonparametric smoothing uses a localization technique, i.e. the weight of observation (X_i, Y_i) depends on the distance between X_i and X. However, local estimates strongly depend on localizing scheme. In our solution we fix several schemes W_1, ..., W_K, compute corresponding local estimates θ^(1), ..., θ^(K) for each of them and apply an aggregation procedure. We propose an algorithm, which constructs a convex combination of the estimates θ^(1), ..., θ^(K) such that the aggregated estimate behaves approximately as well as the best one from the collection θ^(1), ..., θ^(K). We also study theoretical properties of the procedure, prove oracle results and establish rates of convergence under mild assumptions.
READ FULL TEXT