Takaaki Saeki

research

∙ 09/15/2023

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

This paper proposes a method for extracting a lightweight subset from a ...

0 Kentaro Seki, et al. ∙

research

∙ 02/27/2023

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Pause insertion, also known as phrase break prediction and phrasing, is ...

0 Dong Yang, et al. ∙

research

∙ 01/30/2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

While neural text-to-speech (TTS) has achieved human-like natural synthe...

0 Takaaki Saeki, et al. ∙

research

∙ 12/08/2022

SpeechLMScore: Evaluating speech generation using speech language model

While human evaluation is the most reliable metric for evaluating speech...

0 Soumi Maiti, et al. ∙

research

∙ 10/27/2022

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech

This paper proposes Virtuoso, a massively multilingual speech-text joint...

0 Takaaki Saeki, et al. ∙

research

∙ 10/26/2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

This paper proposes a method for selecting training data for text-to-spe...

0 Kentaro Seki, et al. ∙

research

∙ 10/18/2022

Spontaneous speech synthesis with linguistic-speech consistency training using pseudo-filled pauses

We propose a training method for spontaneous speech synthesis models tha...

0 Yuta Matsunaga, et al. ∙

research

∙ 10/14/2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

We present a comprehensive empirical study for personalized spontaneous ...

0 Yuta Matsunaga, et al. ∙

research

∙ 04/05/2022

UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022

We present the UTokyo-SaruLab mean opinion score (MOS) prediction system...

0 Takaaki Saeki, et al. ∙

research

∙ 03/29/2022

DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning

Most text-to-speech (TTS) methods use high-quality speech corpora record...

0 Takaaki Saeki, et al. ∙

research

∙ 03/28/2022

vTTS: visual-text to speech

This paper proposes visual-text to speech (vTTS), a method for synthesiz...

0 Yoshifumi Nakano, et al. ∙

research

∙ 03/24/2022

SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

We present a self-supervised speech restoration method without paired sp...

0 Takaaki Saeki, et al. ∙

research

∙ 03/18/2022

Personalized filled-pause generation with group-wise prediction models

In this paper, we propose a method to generate personalized filled pause...

0 Yuta Matsunaga, et al. ∙

research

∙ 12/17/2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

In this paper, we construct a new Japanese speech corpus called "JTubeSp...

0 Shinnosuke Takamichi, et al. ∙

research

∙ 10/15/2021

ESPnet2-TTS: Extending the Edge of TTS Research

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...

0 Tomoki Hayashi, et al. ∙

research

∙ 09/22/2021

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network

Incremental text-to-speech (TTS) synthesis generates utterances in small...

0 Takaaki Saeki, et al. ∙

research

∙ 12/23/2020

Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model

Text-to-speech (TTS) synthesis, a technique for artificially generating ...

0 Takaaki Saeki, et al. ∙

research

∙ 02/17/2020

Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials

In this paper, we propose computationally efficient and high-quality met...

0 Takaaki Saeki, et al. ∙

Takaaki Saeki

Featured Co-authors

Sign in with Google

Consider DeepAI Pro