A Comprehensive Survey of Automated Audio Captioning

05/11/2022
by   Xuenan Xu, et al.
0

Automated audio captioning, a task that mimics human perception as well as innovatively links audio processing and natural language processing, has overseen much progress over the last few years. Audio captioning requires recognizing the acoustic scene, primary audio events and sometimes the spatial and temporal relationship between events in an audio clip. It also requires describing these elements by a fluent and vivid sentence. Deep learning-based approaches are widely adopted to tackle this problem. This current paper situates itself as a comprehensive review covering the benchmark datasets, existing deep learning techniques and the evaluation metrics in automated audio captioning.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset