For fine-grained generation and recognition tasks such as
minimally-supe...
Recently, there has been a growing interest in text-to-speech (TTS) meth...
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel vi...
In recent years, the joint training of speech enhancement front-end and
...
Time-domain speech enhancement (SE) has recently been intensively
invest...
Recently, many deep learning based beamformers have been proposed for
mu...
The bi-encoder structure has been intensively investigated in code-switc...
Speaker extraction seeks to extract the target speech in a multi-talker
...
Recent neural network based Direction of Arrival (DoA) estimation algori...
Dual-encoder structure successfully utilizes two language-specific encod...
Sound source localization aims to seek the direction of arrival (DOA) of...
Heterogeneous graph neural network (HGNN) is a very popular technique fo...
Speaker embedding is an important front-end module to explore discrimina...
Speaker extraction aims to extract the target speaker's voice from a
mul...
The end-to-end speech synthesis model can directly take an utterance as
...
Speaker extraction uses a pre-recorded reference speech as the reference...
Speaker extraction aims to extract the target speech signal from a
multi...
Spiking neural networks (SNNs) are considered as a potential candidate t...
Spikes are the currency in central nervous systems for information
trans...
Most existing AU detection works considering AU relationships are relyin...
The capability for environmental sound recognition (ESR) can determine t...
Recently, increasing attention has been directed to the study of the spe...