Recently, there have been significant advancements in voice conversion,
...
Recent strides in neural speech synthesis technologies, while enjoying
w...
Auditory Attention Detection (AAD) aims to detect target speaker from br...
Audio deepfake detection is an emerging active topic. A growing number o...
The rhythm of synthetic speech is usually too smooth, which causes that ...
Current fake audio detection algorithms have achieved promising performa...
End-to-end model, especially Recurrent Neural Network Transducer (RNN-T)...
Self-supervised speech models are a rapidly developing research topic in...
The rapid advancement of spoofing algorithms necessitates the developmen...
Audio deepfake detection is an emerging topic in the artificial intellig...
Current fake audio detection relies on hand-crafted features, which lose...
Existing fake audio detection systems perform well in in-domain testing,...
Over the past few decades, multimodal emotion recognition has made remar...
In this paper, we propose a novel self-distillation method for fake spee...
Text-to-speech (TTS) and voice conversion (VC) are two different tasks b...
Text-based speech editing allows users to edit speech by intuitively cut...
Previous databases have been designed to further the development of fake...
There are already some datasets used for fake audio detection, such as t...
Current end-to-end code-switching Text-to-Speech (TTS) can already gener...
Many effective attempts have been made for deepfake audio detection. How...
Many effective attempts have been made for fake audio detection. However...
The existing fake audio detection systems often rely on expert experienc...
Recently, pioneer research works have proposed a large number of acousti...
Fake audio detection is a growing concern and some relevant datasets hav...
The traditional vocoders have the advantages of high synthesis efficienc...
The text-based speech editor allows the editing of speech through intuit...
Audio deepfake detection is an emerging topic, which was included in the...
End-to-end singing voice synthesis (SVS) is attractive due to the avoida...
Code-switching is about dealing with alternative languages in the
commun...
Fake audio attack becomes a major threat to the speaker verification sys...
Diverse promising datasets have been designed to hold back the developme...
Transducer-based models, such as RNN-Transducer and transformer-transduc...
The autoregressive (AR) models, such as attention-based encoder-decoder
...
Attention-based encoder-decoder (AED) models have achieved promising
per...
Recurrent neural networks (RNNs) have shown significant improvements in
...
The joint training framework for speech enhancement and recognition meth...
Despite the recent significant advances witnessed in end-to-end (E2E) AS...
Non-autoregressive transformer models have achieved extremely fast infer...
Although attention based end-to-end models have achieved promising
perfo...
Monaural speech dereverberation is a very challenging task because no sp...
Previous studies demonstrate that word embeddings and part-of-speech (PO...
In this paper, we propose an end-to-end post-filter method with deep
att...
Recently, language identity information has been utilized to improve the...
Multi-channel deep clustering (MDC) has acquired a good performance for
...
For most of the attention-based sequence-to-sequence models, the decoder...
Because an attention based sequence-to-sequence speech (Seq2Seq) recogni...
Recurrent neural network transducers (RNN-T) have been successfully appl...
Deep clustering (DC) and utterance-level permutation invariant training
...
Integrating an external language model into a sequence-to-sequence speec...
In order to improve the performance for far-field speech recognition, th...