Density-sensitive semisupervised inference
Semisupervised methods are techniques for using labeled data (X_1,Y_1),...,(X_n,Y_n) together with unlabeled data X_n+1,...,X_N to make predictions. These methods invoke some assumptions that link the marginal distribution P_X of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of P_X. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution P_X. Our model includes a parameter α that controls the strength of the semisupervised assumption. We then use the data to adapt to α.
READ FULL TEXT