Multiple Testing Embedded in an Aggregation Tree to Identify where Two Distributions Differ
A key goal of flow cytometry data analysis is to identify the subpopulation of cells whose attributes are responsive to the treatment. These cells are supposed to be sparse among the entire cell population. To identify them, we propose a novel multiple TEsting on the Aggregation tree Method (TEAM) to locate where the treated and the control distributions differ. TEAM has a bottom-up hierarchical structure. On the bottom layer, we search for the short-range spiky distributional differences; while on the higher layers, we search for the long-range weak distributional differences. Starting from layer two, on each layer nested hypotheses are formed based on the testing results from the previous layers, and the rejection rule will also depend on the previous layer. Under the mild conditions, we proved that TEAM will yield consistent layer-specific and overall false discovery proportion (FDP). We also showed that when there are sufficient long-range weak distributions differences, TEAM will yield better power compared with the signal-layer multiple testing methods. The simulations under different settings verified our theoretical results. As an illustration, we applied TEAM to a flow cytometry study where we successfully identified the cell subpopulation that is responsive to the cytomegalovirus antigen.
READ FULL TEXT