Jaime Lorenzo-Trueba

research

∙ 07/31/2023

Multilingual context-based pronunciation learning for Text-to-Speech

Phonetic information and linguistic knowledge are an essential component...

0 Giulia Comini, et al. ∙

research

∙ 07/31/2023

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Neural text-to-speech systems are often optimized on L1/L2 losses, which...

0 Guangyan Zhang, et al. ∙

research

∙ 07/31/2023

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input in...

0 Manuel Sam Ribeiro, et al. ∙

research

∙ 07/29/2022

Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

The availability of data in expressive styles across languages is limite...

0 Giulia Comini, et al. ∙

research

∙ 07/02/2022

Computer-assisted Pronunciation Training – Speech synthesis is almost all you need

The research community has long studied computer-assisted pronunciation ...

0 Daniel Korzekwa, et al. ∙

research

∙ 02/16/2022

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

State-of-the-art text-to-speech (TTS) systems require several hours of r...

0 Adam Gabrys, et al. ∙

research

∙ 02/10/2022

Cross-speaker style transfer for text-to-speech using data augmentation

We address the problem of cross-speaker style transfer for text-to-speec...

0 Manuel Sam Ribeiro, et al. ∙

research

∙ 08/13/2021

Enhancing audio quality for expressive Neural Text-to-Speech

Artificial speech synthesis has made a great leap in terms of naturalnes...

0 Abdelhamid Ezzerg, et al. ∙

research

∙ 06/16/2021

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

Voice Conversion (VC) is a technique that aims to transform the non-ling...

0 Alejandro Mottini, et al. ∙

research

∙ 06/14/2021

A learned conditional prior for the VAE acoustic space of a TTS system

Many factors influence speech yielding different renditions of a given s...

0 Penny Karanasou, et al. ∙

research

∙ 06/07/2021

Weakly-supervised word-level pronunciation error detection in non-native English speech

We propose a weakly-supervised model for word-level mispronunciation det...

0 Daniel Korzekwa, et al. ∙

research

∙ 04/15/2021

Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Developing Text Normalization (TN) systems for Text-to-Speech (TTS) on n...

0 Shubhi Tyagi, et al. ∙

research

∙ 01/16/2021

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

A common approach to the automatic detection of mispronunciation in lang...

0 Daniel Korzekwa, et al. ∙

research

∙ 01/14/2021

EmoCat: Language-agnostic Emotional Voice Conversion

Emotional voice conversion models adapt the emotion in speech without ch...

0 Bastian Schnell, et al. ∙

research

∙ 12/29/2020

Detection of Lexical Stress Errors in Non-native (L2) English with Data Augmentation and Attention

This paper describes two novel complementary techniques that improve the...

0 Daniel Korzekwa, et al. ∙

research

∙ 12/17/2020

Parallel WaveNet conditioned on VAE latent vectors

Recently the state-of-the-art text-to-speech synthesis systems have shif...

0 Jonas Rohnke, et al. ∙

research

∙ 11/11/2020

Low-resource expressive text-to-speech using data augmentation

While recent neural text-to-speech (TTS) systems perform remarkably well...

0 Goeric Huybrechts, et al. ∙

research

∙ 12/11/2019

Voice Conversion for Whispered Speech Synthesis

We present an approach to synthesize whisper by applying a handcrafted s...

0 Marius Cotescu, et al. ∙

research

∙ 12/02/2019

Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

Recent advances in Text-to-Speech (TTS) have improved quality and natura...

0 Shubhi Tyagi, et al. ∙

research

∙ 11/28/2019

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

We propose a Text-to-Speech method to create an unseen expressive style ...

0 Vatsal Aggarwal, et al. ∙

research

∙ 11/10/2019

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

Nowadays vast amounts of speech data are recorded from low-quality recor...

0 Seyyed Saeed Sarfjoo, et al. ∙

research

∙ 04/04/2019

In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

Neural text-to-speech synthesis (NTTS) models have shown significant pro...

0 Nishant Prateek, et al. ∙

research

∙ 11/15/2018

Effect of data reduction on sequence-to-sequence neural TTS

Recent speech synthesis systems based on sampling from autoregressive ne...

0 Javier Latorre, et al. ∙

research

∙ 11/15/2018

Robust universal neural vocoding

This paper introduces a robust universal neural vocoder trained with 74 ...

0 Jaime Lorenzo-Trueba, et al. ∙

research

∙ 04/23/2018

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

Voice conversion (VC) aims at conversion of speaker characteristic witho...

0 Tomi Kinnunen, et al. ∙

research

∙ 04/12/2018

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

We present the Voice Conversion Challenge 2018, designed as a follow up ...

0 Jaime Lorenzo-Trueba, et al. ∙

research

∙ 04/07/2018

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

Recent advances in speech synthesis suggest that limitations such as the...

0 Xin Wang, et al. ∙

research

∙ 04/02/2018

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

Although voice conversion (VC) algorithms have achieved remarkable succe...

0 Fuming Fang, et al. ∙

research

∙ 03/02/2018

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

Thanks to the growing availability of spoofing databases and rapid advan...

0 Jaime Lorenzo-Trueba, et al. ∙

Jaime Lorenzo-Trueba

Featured Co-authors

Sign in with Google

Consider DeepAI Pro