Analysis of p-Laplacian Regularization in Semi-Supervised Learning
We investigate a family of regression problems in a semi-supervised setting. The task is to assign real-valued labels to a set of n sample points, provided a small training subset of N labeled points. A goal of semi-supervised learning is to take advantage of the (geometric) structure provided by the large number of unlabeled data when assigning labels. We consider random geometric graphs, with connection radius ϵ(n), to represent the geometry of the data set. Functionals which model the task reward the regularity of the estimator function and impose or reward the agreement with the training data. Here we consider the discrete p-Laplacian regularization. We investigate asymptotic behavior when the number of unlabeled points increases, while the number of training points remains fixed. We uncover a delicate interplay between the regularizing nature of the functionals considered and the nonlocality inherent to the graph constructions. We rigorously obtain almost optimal ranges on the scaling of ϵ(n) for the asymptotic consistency to hold. We prove that the minimizers of the discrete functionals in random setting converge uniformly to the desired continuum limit. Furthermore we discover that for the standard model used there is a restrictive upper bound on how quickly ϵ(n) must converge to zero as n →∞. We introduce a new model which is as simple as the original model, but overcomes this restriction.
READ FULL TEXT