Neural text-to-speech (TTS) generally consists of cascaded architecture ...
Dataset distillation aims to generate small datasets with little informa...
We previously proposed contextual spelling correction (CSC) to correct t...
We introduce a language modeling approach for text to speech synthesis (...
Current text to speech (TTS) systems usually leverage a cascaded acousti...
This paper proposes a new "decompose-and-edit" paradigm for the text-bas...
Text to speech (TTS) has made rapid progress in both academia and indust...
Contextual biasing is an important and challenging task for end-to-end
a...
It's challenging to customize transducer-based automatic speech recognit...
Custom voice, a specific text to speech (TTS) service in commercial spee...
Because of its streaming nature, recurrent neural network transducer (RN...
To speed up the inference of neural speech synthesis, non-autoregressive...
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotro...