Current speaker recognition systems primarily rely on supervised approac...
End-to-end models, such as the neural Transducer, have been successful i...
Computational complexity is critical when deploying deep learning-based
...
Echo cancellation and noise reduction are essential for full-duplex
comm...
Automatic speech recognition (ASR) based on transducers is widely used. ...
This paper summarizes the cinematic demixing (CDX) track of the Sound
De...
This paper summarizes the music demixing (MDX) track of the Sound Demixi...
A key challenge in dysarthric speech recognition is the speaker-level
di...
Expressive text-to-speech (TTS) can synthesize a new speaking style by
i...
Sequence-to-Sequence (seq2seq) tasks transcribe the input sequence to a
...
The performance of music source separation (MSS) models has been greatly...
The training of modern speech processing systems often requires a large
...
Generating sound effects that humans want is an important topic. However...
Despite the rapid progress in automatic speech recognition (ASR) researc...
Despite the rapid advance of automatic speech recognition (ASR) technolo...
Target sound extraction (TSE) aims to extract the sound part of a target...
In automatic speech recognition (ASR) research, discriminative criteria ...
Automatic recognition of dysarthric and elderly speech highly challengin...
Despite the rapid progress of automatic speech recognition (ASR) technol...
Disordered speech recognition is a highly challenging task. The underlyi...
Automatic recognition of disordered speech remains a highly challenging ...
State-of-the-art automatic speech recognition (ASR) system development i...
Despite the rapid progress of end-to-end (E2E) automatic speech recognit...
Recently, End-to-End (E2E) frameworks have achieved remarkable results o...
State-of-the-art language models (LMs) represented by long-short term me...
Recognition of overlapped speech has been a highly challenging task to d...
State-of-the-art neural language models represented by Transformers are
...
The high memory consumption and computational costs of Recurrent neural
...
Language understanding in speech-based systems have attracted much atten...
Automatic recognition of disordered speech remains a highly challenging ...
State-of-the-art neural language models (LMs) represented by Transformer...
Speaker verification systems usually suffer from the mismatch problem be...
Automatic recognition of overlapped speech remains a highly challenging ...
This work investigates the vulnerability of Gaussian Mix-ture Model (GMM...