Self-supervised learning (SSL) proficiency in speech-related tasks has d...
Although diffusion models in text-to-speech have become a popular choice...
Although high-fidelity speech can be obtained for intralingual speech
sy...
Recently, end-to-end (E2E) automatic speech recognition (ASR) models hav...
The utilization of discrete speech tokens, divided into semantic tokens ...
In this paper, we describe the systems developed by the SJTU X-LANCE tea...
Although current neural text-to-speech (TTS) models are able to generate...
The mainstream neural text-to-speech(TTS) pipeline is a cascade system,
...
Although word-level prosody modeling in neural text-to-speech (TTS) has ...
Generating natural speech with diverse and smooth prosody pattern is a
c...
Recent researches on both utterance-level and phone-level prosody modell...
Training a code-switching end-to-end automatic speech recognition (ASR) ...