Mixture Proportion Estimation for Positive--Unlabeled Learning via Classifier Dimension Reduction
Positive--unlabeled (PU) learning considers two samples, a positive set P with observations from only one class and an unlabeled set U with observations from two classes. The goal is to classify observations in U. Class mixture proportion estimation (MPE) in U is a key step in PU learning. In this paper, we show that PU learning is a generalization of local False Discovery Rate estimation. Further we show that PU learning MPE can be reduced to a one--dimensional problem via construction of a classifier trained on the P and U data sets. These observations enable application of methodology from the multiple testing literature to the PU learning problem. In particular we adapt ideas from Storey [2002] and Patra and Sen [2015] to address parameter identifiability and MPE. We prove consistency of two mixture proportion estimators using bounds from empirical process theory, develop tuning parameter free implementations, and demonstrate that they have competitive performance on simulated waveform data and a protein signaling problem.
READ FULL TEXT