Data-driven approaches hold promise for audio captioning. However, the
d...
Audio super-resolution is a fundamental task that predicts high-frequenc...
Fish feeding intensity assessment (FFIA) aims to evaluate the intensity
...
Diffusion models have shown promising results in cross-modal generation
...
Sounds carry an abundance of information about activities and events in ...
Automatic detection and classification of animal sounds has many applica...
Universal source separation (USS) is a fundamental research task for
com...
The advancement of audio-language (AL) multimodal learning tasks has bee...
This study defines a new evaluation metric for audio tagging tasks to
ov...
Audio captioning is the task of generating captions that describe the co...
The audio spectrogram is a time-frequency representation that has been w...
Recently, there has been increasing interest in building efficient audio...
Few-shot audio event detection is a task that detects the occurrence tim...
Few-shot bioacoustic event detection is a task that detects the occurren...
Binaural audio plays a significant role in constructing immersive augmen...
Text to speech (TTS) has made rapid progress in both academia and indust...
Speech restoration aims to remove distortions in speech signals. Prior
m...
In this paper, we introduce the task of language-queried audio source
se...
Speech super-resolution (SR) is a task to increase speech sampling rate ...
Audio captioning aims at using natural language to describe the content ...
Music source separation (MSS) shows active progress with deep learning m...
Speech restoration aims to remove distortions in speech signals. Prior
m...
Deep neural network based methods have been successfully applied to musi...
Acoustic echo and background noise can seriously degrade the intelligibi...
Speech enhancement is a task to improve the intelligibility and perceptu...
This paper presents a new input format, channel-wise subband input (CWS)...