The goal of speech enhancement (SE) is to eliminate the background
inter...
Various applications of voice synthesis have been developed independentl...
Large diffusion models have been successful in text-to-audio (T2A) synth...
Audio codec models are widely used in audio communication as a crucial
t...
Large language models (LLMs) have exhibited remarkable capabilities acro...
In text-audio retrieval (TAR) tasks, due to the heterogeneity of content...
Existing weakly supervised sound event detection (WSSED) work has not
ex...
Expressive text-to-speech (TTS) aims to synthesize different speaking st...
Large-scale multimodal generative modeling has created milestones in
tex...
Expressive text-to-speech (TTS) can synthesize a new speaking style by
i...
Generating sound effects that humans want is an important topic. However...
Dominant researches adopt supervised training for speaker extraction, wh...
Target sound detection (TSD) aims to detect the target sound from a mixt...
Target sound detection (TSD) aims to detect the target sound from mixtur...
Recently, end-to-end speaker extraction has attracted increasing attenti...
Target sound extraction (TSE) aims to extract the sound part of a target...
Human beings can perceive a target sound that we are interested in from ...
Automated audio captioning (AAC) has developed rapidly in recent years,
...
Although prototypical network (ProtoNet) has proved to be an effective m...
It is well known that the mismatch between training (source) and test
(t...
In spoken question answering, QA systems are designed to answer question...