Roberto Barra-Chicote

research

∙ 07/31/2023

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Neural text-to-speech systems are often optimized on L1/L2 losses, which...

0 Guangyan Zhang, et al. ∙

research

∙ 07/23/2023

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

Numerous examples in the literature proved that deep learning models hav...

0 Iván Vallés-Pérez, et al. ∙

research

∙ 11/04/2022

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

Stuttering is a speech disorder where the natural flow of speech is inte...

0 Xin Zhang, et al. ∙

research

∙ 07/04/2022

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

In this paper, we propose GlowVC: a multilingual multi-speaker flow-base...

0 Magdalena Proszewska, et al. ∙

research

∙ 04/06/2022

Prosodic Alignment for off-screen automatic dubbing

The goal of automatic dubbing is to perform speech-to-speech translation...

0 Yogesh Virkar, et al. ∙

research

∙ 03/15/2022

Text-free non-parallel many-to-many voice conversion using normalising flows

Non-parallel voice conversion (VC) is typically achieved using lossy rep...

0 Thomas Merritt, et al. ∙

research

∙ 02/16/2022

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

State-of-the-art text-to-speech (TTS) systems require several hours of r...

0 Adam Gabrys, et al. ∙

research

∙ 10/08/2021

Machine Translation Verbosity Control for Automatic Dubbing

Automatic dubbing aims at seamlessly replacing the speech in a video doc...

0 Surafel M. Lakew, et al. ∙

research

∙ 06/16/2021

Improving the expressiveness of neural vocoding with non-affine Normalizing Flows

This paper proposes a general enhancement to the Normalizing Flows (NF) ...

0 Adam Gabrys, et al. ∙

research

∙ 06/14/2021

SynthASR: Unlocking Synthetic Data for Speech Recognition

End-to-end (E2E) automatic speech recognition (ASR) models have recently...

0 Amin Fazel, et al. ∙

research

∙ 06/10/2021

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows

Text-to-speech systems recently achieved almost indistinguishable qualit...

0 Iván Vallés-Pérez, et al. ∙

research

∙ 01/09/2021

Spanish expressive voices: Corpus for emotion research in spanish

A new emotional multimedia database has been recorded and aligned. The d...

0 Roberto Barra-Chicote, et al. ∙

research

∙ 01/09/2021

Emotion transplantation through adaptation in HMM-based speech synthesis

This paper proposes an emotion transplantation method capable of modifyi...

0 Roberto Barra-Chicote, et al. ∙

research

∙ 01/09/2021

Analysis of Statistical Parametric and Unit Selection Speech Synthesis Systems Applied to Emotional Speech

We have applied two state-of-the-art speech synthesis techniques (unit s...

0 Roberto Barra-Chicote, et al. ∙

research

∙ 12/29/2020

Detection of Lexical Stress Errors in Non-native (L2) English with Data Augmentation and Attention

This paper describes two novel complementary techniques that improve the...

0 Daniel Korzekwa, et al. ∙

research

∙ 12/17/2020

Parallel WaveNet conditioned on VAE latent vectors

Recently the state-of-the-art text-to-speech synthesis systems have shif...

0 Jonas Rohnke, et al. ∙

research

∙ 01/19/2020

From Speech-to-Speech Translation to Automatic Dubbing

We present enhancements to a speech-to-speech translation pipeline in or...

0 Marcello Federico, et al. ∙

research

∙ 11/28/2019

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

We propose a Text-to-Speech method to create an unseen expressive style ...

0 Vatsal Aggarwal, et al. ∙

research

∙ 07/10/2019

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

This paper proposed a novel approach for the detection and reconstructio...

0 Daniel Korzekwa, et al. ∙

research

∙ 04/04/2019

In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

Neural text-to-speech synthesis (NTTS) models have shown significant pro...

0 Nishant Prateek, et al. ∙

research

∙ 11/15/2018

Comprehensive evaluation of statistical speech waveform synthesis

Statistical TTS systems that directly predict the speech waveform have r...

0 Thomas Merritt, et al. ∙

research

∙ 11/15/2018

Robust universal neural vocoding

This paper introduces a robust universal neural vocoder trained with 74 ...

0 Jaime Lorenzo-Trueba, et al. ∙

Roberto Barra-Chicote

Featured Co-authors

Sign in with Google

Consider DeepAI Pro