For personalized speech generation, a neural text-to-speech (TTS) model ...
Large, pre-trained representation models trained using self-supervised
l...
This paper introduces an end-to-end neural speech restoration model,
HD-...
This report describes our submission to Task 2 of the Auditory EEG Decod...
In this paper, we propose a novel end-to-end user-defined keyword spotti...
The Gumbel-softmax distribution, or Concrete distribution, is often used...
The quality of end-to-end neural text-to-speech (TTS) systems highly dep...
Modern neural speech enhancement models usually include various forms of...
ASV (automatic speaker verification) systems are intrinsically required ...
In this paper, we propose an effective method to synthesize speaker-spec...
In this paper, we address the problem of separating individual speech si...
Many approaches can derive information about a single speaker's identity...
In this paper, we propose an effective training strategy to ex-tract rob...
The objective of this paper is to separate a target speaker's speech fro...
The goal of this work is to train discriminative cross-modal embeddings
...
This paper proposes an effective emotion control method for an end-to-en...
In this paper, we propose a deep learning (DL)-based parameter enhanceme...
In this paper, we propose a high-quality generative text-to-speech (TTS)...
Deep clustering is a deep neural network-based speech separation algorit...
We propose a linear prediction (LP)-based waveform generation method via...
This paper proposes a WaveNet-based neural excitation model (ExcitNet) f...
This paper proposes speaker-adaptive neural vocoders for statistical
par...
This paper proposes a new strategy for learning powerful cross-modal
emb...