Disentangled Feature for Weakly Supervised Multi-class Sound Event Detection
We propose a disentangled feature for weakly supervised multiclass sound event detection (SED), which helps ameliorate the performance and the training efficiency of class-wise attention based detection system by the introduction of more class-wise prior information as well as the network redundancy weight reduction. In this paper, we approach SED as a multiple instance learning (MIL) problem and utilize a neural network framework with class-wise attention pooling (cATP) module to solve it. Aiming at making finer detection even if there is only a small number of clips with less co-occurrence of the categories available in the training set, we optimize the high-level feature space of cATP-MIL by disentangling it based on class-wise identifiable information in the training set and obtain multiple different subspaces. Experiments show that our approach achieves competitive performance on Task4 of the DCASE2018 challenge.
READ FULL TEXT