Combining independent p-values in replicability analysis: A comparative study
Given a family of null hypotheses H_1,…,H_s, we are interested in the hypothesis H_s^γ that at most γ-1 of these null hypotheses are false. Assuming that the corresponding p-values are independent, we are investigating combined p-values that are valid for testing H_s^γ. In various settings in which H_s^γ is false, we determine which combined p-value works well in which setting. Via simulations, we find that the Stouffer method works well if the null p-values are uniformly distributed and the signal strength is low, and the Fisher method works better if the null p-values are conservative, i.e. stochastically larger than the uniform distribution. The minimum method works well if the evidence for the rejection of H_s^γ is focused on only a few non-null p-values, especially if the null p-values are conservative. Methods that incorporate the combination of e-values work well if the null hypotheses H_1,…,H_s are simple.
READ FULL TEXT