This paper proposes a method for extracting a lightweight subset from a
...
Pause insertion, also known as phrase break prediction and phrasing, is ...
While neural text-to-speech (TTS) has achieved human-like natural synthe...
While human evaluation is the most reliable metric for evaluating speech...
This paper proposes Virtuoso, a massively multilingual speech-text joint...
This paper proposes a method for selecting training data for text-to-spe...
We propose a training method for spontaneous speech synthesis models tha...
We present a comprehensive empirical study for personalized spontaneous
...
We present the UTokyo-SaruLab mean opinion score (MOS) prediction system...
Most text-to-speech (TTS) methods use high-quality speech corpora record...
This paper proposes visual-text to speech (vTTS), a method for synthesiz...
We present a self-supervised speech restoration method without paired sp...
In this paper, we propose a method to generate personalized filled pause...
In this paper, we construct a new Japanese speech corpus called
"JTubeSp...
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...
Incremental text-to-speech (TTS) synthesis generates utterances in small...
Text-to-speech (TTS) synthesis, a technique for artificially generating
...
In this paper, we propose computationally efficient and high-quality met...