Low-rank matrix denoising for count data using unbiased Kullback-Leibler risk estimation

01/28/2020
by   Jérémie Bigot, et al.
0

This paper is concerned by the analysis of observations organized in a matrix form whose elements are count data assumed to follow a Poisson or a multinomial distribution. We focus on the estimation of either the intensity matrix (Poisson case) or the compositional matrix (multinomial case) that is assumed to have a low rank structure. We propose to construct an estimator minimizing the regularized negative log-likelihood by a nuclear norm penalty. Our approach easily yields a low-rank matrix-valued estimator with positive entries which belongs to the set of row-stochastic matrices in the multinomial case. Then, our main contribution is to propose a data-driven way to select the regularization parameter in the construction of such estimators by minimizing (approximately) unbiased estimates of the Kullback-Leibler (KL) risk in such models, which generalize Stein's unbiased risk estimation originally proposed for Gaussian data. The evaluation of these quantities is a delicate problem, and we introduce novel methods to obtain accurate numerical approximation of such unbiased estimates. Simulated data are used to validate this way of selecting regularizing parameters for low-rank matrix estimation from count data. Examples from a survey study and metagenomics also illustrate the benefits of our approach for real data analysis.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset