Deep Learning Frameworks Applied For Audio-Visual Scene Classification

06/12/2021
by   Lam Pham, et al.
0

In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance. Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2 input only, visual input only, and both audio-visual input, respectively. The highest classification accuracy of 93.9 audio-based and visual-based frameworks, shows an improvement of 16.5 with DCASE baseline.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset