While transformers and their variant conformers show promising performan...
Audio deepfake detection is an emerging topic, which was included in the...
Fake audio attack becomes a major threat to the speaker verification sys...
Diverse promising datasets have been designed to hold back the developme...
Transducer-based models, such as RNN-Transducer and transformer-transduc...
The autoregressive (AR) models, such as attention-based encoder-decoder
...
Attention-based encoder-decoder (AED) models have achieved promising
per...
Despite the recent significant advances witnessed in end-to-end (E2E) AS...
Non-autoregressive transformer models have achieved extremely fast infer...
Although attention based end-to-end models have achieved promising
perfo...
Previous studies demonstrate that word embeddings and part-of-speech (PO...
Recently, language identity information has been utilized to improve the...
For most of the attention-based sequence-to-sequence models, the decoder...
Because an attention based sequence-to-sequence speech (Seq2Seq) recogni...
Recurrent neural network transducers (RNN-T) have been successfully appl...
Integrating an external language model into a sequence-to-sequence speec...