Optimal Estimation of Simultaneous Signals Using Absolute Inner Product with Applications to Integrative Genomics
Integrating the summary statistics from genome-wide association study (GWAS) and expression quantitative trait loci (eQTL) data provides a powerful way of identifying the genes whose expression levels are causally associated with complex diseases. A parameter that quantifies the genetic sharing (colocalisation) between disease phenotype and gene expression of a given gene based on the summary statistics is first introduced based on the mean values of two Gaussian sequences. Specifically, given two independent samples X∼ N(θ, I_n) and Y∼ N(μ, I_n), the parameter of interest is T(θ, μ)=n^-1∑_i=1^n |θ_i|· |μ_i|, a non-smooth functional, which characterizes the degree of shared signals between two absolute normal mean vectors |θ| and |μ|. Using approximation theory and Hermite polynomials, a sparse absolute colocalisation estimator (SpACE) is constructed and shown to be minimax rate optimal over sparse parameter spaces. Our simulation demonstrates that the proposed estimates out-perform other naive methods, resulting in smaller estimation errors. In addition, the methods are robust to the presence of block-wise correlated observations due to linkage equilibrium. The method is applied to an integrative analysis of heart failure genomics data sets and identifies several genes and biological pathways that are possibly causal to human heart failure.
READ FULL TEXT