Tomoki Hayashi

research

∙ 04/10/2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...

0 Brian Yan, et al. ∙

research

∙ 07/10/2022

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

We present a large-scale comparative study of self-supervised speech rep...

0 Wen-Chin Huang, et al. ∙

research

∙ 06/13/2022

Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure

Anomalous sound detection systems must detect unknown, atypical sounds u...

0 Ibuki Kuroyanagi, et al. ∙

research

∙ 02/17/2022

Acoustic Event Detection with Classifier Chains

This paper proposes acoustic event detection (AED) with classifier chain...

0 Tatsuya Komatsu, et al. ∙

research

∙ 12/17/2021

Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem

Deep learning based models have significantly improved the performance o...

0 Jing Shi, et al. ∙

research

∙ 11/24/2021

ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel Appearance Invariant Semantic Representations

This work presents a self-supervised method to learn dense semantically ...

8 Robin Karlsson, et al. ∙

research

∙ 10/15/2021

ESPnet2-TTS: Extending the Edge of TTS Research

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...

0 Tomoki Hayashi, et al. ∙

research

∙ 10/12/2021

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

This paper introduces S3PRL-VC, an open-source voice conversion (VC) fra...

0 Wen-Chin Huang, et al. ∙

research

∙ 07/20/2021

On Prosody Modeling for ASR+TTS based Voice Conversion

In voice conversion (VC), an approach showing promising results in the l...

0 Wen-Chin Huang, et al. ∙

research

∙ 06/11/2021

Anomalous Sound Detection Using a Binary Classification Model and Class Centroids

An anomalous sound detection system to detect unknown anomalous sounds u...

0 Ibuki Kuroyanagi, et al. ∙

research

∙ 04/14/2021

Non-autoregressive sequence-to-sequence voice conversion

This paper proposes a novel voice conversion (VC) method based on non-au...

0 Tomoki Hayashi, et al. ∙

research

∙ 03/04/2021

crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder

In this paper, we present an open-source software for developing a nonpa...

0 Kazuhiro Kobayashi, et al. ∙

research

∙ 12/23/2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

This paper describes the recent development of ESPnet (https://github.co...

0 Shinji Watanabe, et al. ∙

research

∙ 11/07/2020

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

We present ESPnet-SE, which is designed for the quick development of spe...

0 Chenda Li, et al. ∙

research

∙ 10/26/2020

Recent Developments on ESPnet Toolkit Boosted by Conformer

In this study, we present recent developments on ESPnet: End-to-End Spee...

0 Pengcheng Guo, et al. ∙

research

∙ 10/23/2020

Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations

We present a novel approach to any-to-one (A2O) voice conversion (VC) in...

0 Wen-Chin Huang, et al. ∙

research

∙ 10/06/2020

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS

This paper presents the sequence-to-sequence (seq2seq) baseline system f...

0 Wen-Chin Huang, et al. ∙

research

∙ 08/07/2020

Pretraining Techniques for Sequence-to-Sequence Voice Conversion

Sequence-to-sequence (seq2seq) voice conversion (VC) models are attracti...

0 Wen-Chin Huang, et al. ∙

research

∙ 07/25/2020

Quasi-Periodic Parallel WaveGAN: A Non-autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

In this paper, we propose a quasi-periodic parallel WaveGAN (QPPWG) wave...

0 Yi-Chiao Wu, et al. ∙

research

∙ 07/11/2020

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

In this paper, a pitch-adaptive waveform generative model named Quasi-Pe...

0 Yi-Chiao Wu, et al. ∙

research

∙ 05/18/2020

Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation

In this paper, we propose a parallel WaveGAN (PWG)-like neural vocoder w...

0 Yi-Chiao Wu, et al. ∙

research

∙ 05/12/2020

DiscreTalk: Text-to-Speech as a Machine Translation Problem

This paper proposes a new end-to-end text-to-speech (E2E-TTS) model base...

0 Tomoki Hayashi, et al. ∙

research

∙ 04/21/2020

ESPnet-ST: All-in-One Speech Translation Toolkit

We present ESPnet-ST, which is designed for the quick development of spe...

0 Hirofumi Inaguma, et al. ∙

research

∙ 03/26/2020

Non-parallel Voice Conversion System with WaveNet Vocoder and Collapsed Speech Suppression

In this paper, we integrate a simple non-parallel voice conversion (VC) ...

0 Yi-Chiao Wu, et al. ∙

research

∙ 02/03/2020

End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection

This paper integrates a voice activity detection (VAD) function with end...

0 Takenori Yoshimura, et al. ∙

research

∙ 12/14/2019

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC...

0 Wen-Chin Huang, et al. ∙

research

∙ 10/24/2019

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit ...

0 Tomoki Hayashi, et al. ∙

research

∙ 09/13/2019

A Comparative Study on Transformer vs RNN in Speech Applications

Sequence-to-sequence models have been widely used in end-to-end speech p...

0 Shigeki Karita, et al. ∙

research

∙ 07/24/2019

Non-Parallel Voice Conversion with Cyclic Variational Autoencoder

In this paper, we present a novel technique for a non-parallel voice con...

0 Patrick Lumban Tobing, et al. ∙

research

∙ 07/21/2019

Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder

In this paper, we investigate the effectiveness of a quasi-periodic Wave...

0 Yi-Chiao Wu, et al. ∙

research

∙ 07/01/2019

Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation

In this paper, we propose a quasi-periodic neural network (QPNet) vocode...

0 Yi-Chiao Wu, et al. ∙

research

∙ 05/02/2019

Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion

In this work, we investigate the effectiveness of two techniques for imp...

0 Wen-Chin Huang, et al. ∙

research

∙ 11/27/2018

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

This paper presents a refinement framework of WaveNet vocoders for varia...

0 Wen-Chin Huang, et al. ∙

research

∙ 11/02/2018

Cycle-consistency training for end-to-end speech recognition

This paper presents a method to train end-to-end automatic speech recogn...

0 Takaaki Hori, et al. ∙

research

∙ 07/28/2018

Back-Translation-Style Data Augmentation for End-to-End ASR

In this paper we propose a novel data augmentation method for attention-...

0 Tomoki Hayashi, et al. ∙

research

∙ 04/30/2018

Collapsed speech segment detection and suppression for WaveNet vocoder

In this paper, we propose a technique to alleviate quality degradation c...

0 Yi-Chiao Wu, et al. ∙

research

∙ 04/22/2018

Multi-Head Decoder for End-to-End Speech Recognition

This paper presents a new network architecture called multi-head decoder...

0 Tomoki Hayashi, et al. ∙

research

∙ 03/30/2018

ESPnet: End-to-End Speech Processing Toolkit

This paper introduces a new open source platform for end-to-end speech p...

0 Shinji Watanabe, et al. ∙

Tomoki Hayashi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro