On using empirical null distribution in Benjamini-Hochberg procedure
When performing multiple testing, adjusting the distribution of the null hypotheses is ubiquitous in applications. However, the cost of such an operation remains largely unknown in terms of false discovery proportion (FDP) and true discovery proportion (TDP). We explore this issue in the most classical case where the null hypotheses are Gaussian with an unknown rescaling parameters (mean and variance) and where the Benjamini-Hochberg (BH) procedure is applied after a data-rescaling step. Our main result identifies the following sparsity boundary: an asymptotically optimal rescaling (in some specific sense) exists if and only if the sparsity k (number of false nulls) is of order less than n/log(n), where n is the total number of tests. Our proof relies on new non-asymptotic lower bounds on FDP/TDP, which are of independent interest and share similarities with those developed in the minimax robust statistical theory. Further sparsity boundaries are derived for general location models for which the shape of the null distribution is not necessarily Gaussian.
READ FULL TEXT