Scalable Probabilistic Matrix Factorization with Graph-Based Priors
In matrix factorization, available graph side-information may not be well suited for the matrix completion problem, having edges that disagree with the latent-feature relations learnt from the incomplete data matrix. We show that removing these contested edges improves prediction accuracy and scalability. We identify the contested edges through a highly-efficient graphical lasso approximation. The identification and removal of contested edges adds no computational complexity to state-of-the-art non-probabilistic graph-regularized matrix factorization, remaining linear with respect to the number of non-zeros. Computational load even decreases proportional to the number of edges removed. Formulating a probabilistic generative model and using expectation maximization guarantees convergence. Rich simulated experiments illustrate the desired properties of the resulting algorithm. On real data experiments we demonstrate improved prediction accuracy on four out of five experiments (empirical evidence that graph side-information is often inaccurate), and the same prediction accuracy with 20 thousand dimensional graph with 3 million edges can be analyzed in under ten minutes on a standard laptop computer.
READ FULL TEXT