A Crowdsourcing Extension of the ITU-T Recommendation P.835 with Validation
The quality of the speech communication systems, which include noise suppression algorithms, are typically evaluated in laboratory experiments according to the ITU-T Rec. P.835. In this paper, we introduce an open-source implementation of the ITU-T Rec. P.835 for the crowdsourcing approach following the ITU-T Rec. P.808 on crowdsourcing recommendations. The implementation is an extension of the P.808 Toolkit and is highly automated to avoid operational errors. To assess our evaluation method's validity, we compared the Mean Opinion Scores (MOS), calculate using ratings collected with our implementation, and the MOS values from a standard laboratory experiment conducted according to the ITU-T Rec, P.835. Results show a high validity in all three scales (average PCC = 0.961). Results of a round-robin test showed that our implementation is a highly reproducible evaluation method (PCC=1.00). Finally, we investigated the performance of five models deep noise suppression models using our P.835 implementation and show what insights can be learned.
READ FULL TEXT