p-Value as the Strength of Evidence Measured by Confidence Distribution
The notion of p-value is a fundamental concept in statistical inference and has been widely used for reporting outcomes of hypothesis tests. However, p-value is often misinterpreted, misused or miscommunicated in practice. Part of the issue is that existing definitions of p-value are often derived from constructions under specific settings, and a general definition that directly reflects the evidence of the null hypothesis is not yet available. In this article, we first propose a general and rigorous definition of p-value that fulfills two performance-based characteristics. The performance-based definition subsumes all existing construction-based definitions of the p-value, and justifies their interpretations. The paper further presents a specific approach based on confidence distribution to formulate and calculate p-values. This specific way of computing p values has two main advantages. First, it is applicable for a wide range of hypothesis testing problems, including the standard one- and two-sided tests, tests with interval-type null, intersection-union tests, multivariate tests and so on. Second, it can naturally lead to a coherent interpretation of p-value as evidence in support of the null hypothesis, as well as a meaningful measure of degree of such support. In particular, it places a meaning of a large p-value, e.g. p-value of 0.8 has more support than 0.5. Numerical examples are used to illustrate the wide applicability and computational feasibility of our approach. We show that our proposal is effective and can be applied broadly, without further consideration of the form/size of the null space. As for existing testing methods, the solutions have not been available or cannot be easily obtained.
READ FULL TEXT