Adaptive Semisupervised Inference
Semisupervised methods inevitably invoke some assumption that links the marginal distribution of the features to the regression function of the label. Most commonly, the cluster or manifold assumptions are used which imply that the regression function is smooth over high-density clusters or manifolds supporting the data. A generalization of these assumptions is that the regression function is smooth with respect to some density sensitive distance. This motivates the use of a density based metric for semisupervised learning. We analyze this setting and make the following contributions - (a) we propose a semi-supervised learner that uses a density-sensitive kernel and show that it provides better performance than any supervised learner if the density support set has a small condition number and (b) we show that it is possible to adapt to the degree of semi-supervisedness using data-dependent choice of a parameter that controls sensitivity of the distance metric to the density. This ensures that the semisupervised learner never performs worse than a supervised learner even if the assumptions fail to hold.
READ FULL TEXT