A Mixture Model to Detect Edges in Sparse Co-expression Graphs
In the early days of microarray data, the medical and statistical communities focused on gene-level data, and particularly on finding differentially expressed genes. This usually involved making a simplifying assumption that genes are independent, which made likelihood derivations feasible and allowed for relatively simple implementations. However, this is not a realistic assumption, and in recent years the scope has expanded, and has come to include pathway and 'gene set' analysis in an attempt to understand the relationships between genes. In this paper we develop a method to recover a gene network's structure from co-expression data, which we measure in terms of normalized Pearson's correlation coefficients between gene pairs. We treat these co-expression measurements as weights in the complete graph in which nodes correspond to genes. We assume that the network is sparse and that only a small fraction of the putative edges are included (`non-null' edges). To decide which edges exist in the gene network, we fit three-component mixture model such that the observed weights of `null edges' follow a normal distribution with mean 0, and the non-null edges follow a mixture of two log-normal distributions, one for positively- and one for negatively-correlated pairs. We show that this so-called L_2N mixture model outperforms other methods in terms of power to detect edges. We also show that using the L_2N model allows for the control of the false discovery rate. Importantly, the method makes no assumptions about the true network structure.
READ FULL TEXT