Automated Audio Captioning (AAC) is the task of generating natural langu...
Audio-Language models jointly learn multimodal text and audio representa...
In the domain of audio processing, Transfer Learning has facilitated the...
Machine Listening, as usually formalized, attempts to perform a task tha...
Emotions lie on a broad continuum and treating emotions as a discrete nu...
Audio-Text retrieval takes a natural language query to retrieve relevant...
Mainstream Audio Analytics models are trained to learn under the paradig...
COVID-19 has resulted in over 100 million infections and caused worldwid...
In Psychology, actions are paramount for humans to perceive and separate...
Realistic recordings of soundscapes often have multiple sound events
Sounds are essential to how humans perceive and interact with the world ...
Acoustic scene recordings are represented by different types of handcraf...
The largest source of sound events is web videos. Most videos lack sound...
In this paper, we focus on the problem of content-based retrieval for au...
Audio-based multimedia retrieval tasks may identify semantic information...
In this paper we present our work on Task 1 Acoustic Scene Classi- ficat...
Recently, sound recognition has been used to identify sounds, such as ca...
City-identification of videos aims to determine the likelihood of a vide...
The YLI Multimedia Event Detection corpus is a public-domain index of vi...