A safe Hosmer-Lemeshow test
This technical report proposes an alternative to the Hosmer-Lemeshow (HL) test for evaluating the calibration of probability forecasts for binary events. The approach is based on e-values, a new tool for hypothesis testing. An e-value is a random variable with expected value less or equal to 1 under a null hypothesis. Large e-values give evidence against the null hypothesis, and the multiplicative inverse of an e-value is a p-value. In a simulation study, the proposed e-values detect even slight miscalibration for larger sample sizes, but with a reduced power compared to the original HL test.
READ FULL TEXT