Graph inference with clustering and false discovery rate control
In this paper, a noisy version of the stochastic block model (NSBM) is introduced and we investigate the three following statistical inferences in this model: estimation of the model parameters, clustering of the nodes and identification of the underlying graph. While the two first inferences are done by using a variational expectation-maximization (VEM) algorithm, the graph inference is done by controlling the false discovery rate (FDR), that is, the average proportion of errors among the edges declared significant, and by maximizing the true discovery rate (TDR), that is, the average proportion of edges declared significant among the true edges. Provided that the VEM algorithm provides reliable parameter estimates and clustering, we theoretically show that our procedure does control the FDR while satisfying an optimal TDR property, up to remainder terms that become small when the size of the graph grows. Numerical experiments show that our method outperforms the classical FDR controlling methods that ignore the underlying SBM topology. In addition, these simulations demonstrate that the FDR/TDR properties of our method are robust to model mis-specification, that is, are essentially maintained outside our model.
READ FULL TEXT