Empirical Bayes Multistage Testing for Large-Scale Experiments
Modern application of A/B tests is challenging due to its large scale in various dimensions, which demands flexibility to deal with multiple testing sequentially. The state-of-the-art practice first reduces the observed data stream to always-valid p-values, and then chooses a cut-off as in conventional multiple testing schemes. Here we propose an alternative method called AMSET (adaptive multistage empirical Bayes test) by incorporating historical data in decision-making to achieve efficiency gains while retaining marginal false discovery rate (mFDR) control that is immune to peeking. We also show that a fully data-driven estimation in AMSET performs robustly to various simulation and real data settings at a large mobile app social network company.
READ FULL TEXT