On Post-Selection Inference in A/B Tests
When a large number of simultaneous statistical inferences are conducted, unbiased estimators become biased if we purposefully select a subset of results to draw conclusions based on certain selection criteria. This happens a lot in A/B tests when there are too many metrics and segments to choose from, and only statistically significant results are considered. This paper proposes two different approaches, one based on supervised learning techniques, and the other based on empirical Bayes. We claim these two views can be unified and conduct large scale simulation and empirical study to benchmark our proposals with different existing methods. Results show our methods make substantial improvement for both point estimation and confidence interval coverage.
READ FULL TEXT