Selective inference for the problem of regions via multiscale bootstrap

11/02/2017
by   Yoshikazu Terada, et al.
0

Selective inference procedures are considered for computing approximately unbiased p-values of hypothesis testing using nonparametric bootstrap resampling without direct access to the parameter space nor the null distribution. A typical example is to assess the uncertainty of hierarchical clustering, where we can easily compute a frequentist confidence level of each cluster by counting how many times it appears in bootstrap replicates. This is implemented in "pvclust" package of R, and we are going to extend it for selective inference. P-values are computed only for the obtained clusters for testing the null hypothesis that the cluster is not true. This is formulated as the "problem of regions" where hypotheses are represented as arbitrary shaped regions in a parameter space. Geometric quantities, namely, signed distance and mean curvature, determine the frequentist confidence level. Our idea is to estimate these geometric quantities by the multiscale bootstrap in which we change the sample size of bootstrap replicates. Our method is second-order accurate in the large sample theory of smooth boundary surfaces of regions, and it is also justified for nonsmooth surfaces. Our p-values are asymptotically equivalent to those of more computationally intensive iterated bootstrap.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset