An evaluation of intrusive instrumental intelligibility metrics
Instrumental intelligibility metrics are commonly used as an alternative to intelligibility listening tests. This paper evaluates 12 existing monaural intrusive instrumental intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and sEPSM^corr. The intelligibility data used in the evaluation were obtained from ten listening tests described in the literature. The stimuli included speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. STOI, which is arguably the most popular intelligibility metric, achieved a correlation with listening test scores on average of ρ=0.80, and its successor, ESTOI, achieved ρ=0.86. The metrics with the highest overall performance were SIIB (ρ=0.92) and HASPI (ρ=0.89). The results show that many intelligibility metrics perform poorly on data sets that were not used during their development, thus caution should be taken when using intelligibility metrics to replace listening tests, especially in situations where the accuracy of the metric has not been verified.
READ FULL TEXT