This study introduces a novel training paradigm, audio difference learni...
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis sy...
End-to-end neural diarization (EEND) with encoder-decoder-based attracto...
Acoustic scene classification (ASC) and sound event detection (SED) are
...
This paper proposes a method for improved CTC inference with searched
in...
End-to-end automatic speech recognition (ASR) directly maps input speech...
This paper proposes InterAug: a novel training method for CTC-based ASR ...
This paper proposes CTC-based non-autoregressive ASR with self-condition...
This paper proposes acoustic event detection (AED) with classifier chain...
We propose multi-layer perceptron (MLP)-based architectures suitable for...
Non-autoregressive (NAR) models simultaneously generate multiple outputs...
This paper proposes a novel label-synchronous speech-to-text alignment
t...
This paper proposes a method to relax the conditional independence assum...
This paper studies how to learn variational autoencoders with a variety ...
This paper proposes a deep neural network (DNN)-based multi-channel spee...
In this paper, we propose a multi-channel speech source separation with ...
Synthesizing and converting environmental sounds have the potential for ...
In this paper, we propose two mask-based beamforming methods using a dee...
This paper proposes a determined blind source separation method using
Ba...
This paper proposes an effective modelling of sound event spectra with a...