We introduce EM-Network, a novel self-distillation approach that effecti...
This study aims to develop a single integrated spoofing-aware speaker
ve...
Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate ...
Recently, the advance in deep learning has brought a considerable improv...
Several recently proposed text-to-speech (TTS) models achieved to genera...
Domain mismatch problem caused by speaker-unrelated feature has been a m...
Pre-training with self-supervised models, such as Hidden-unit BERT (HuBE...
Recent state-of-the-art speaker verification architectures adopt multi-s...
Training a text-to-speech (TTS) model requires a large scale text labele...
Knowledge distillation (KD), best known as an effective method for model...
Most speech-to-text (S2T) translation studies use English speech as a so...
Although neural text-to-speech (TTS) models have attracted a lot of atte...
Paraphrasing is often performed with less concern for controlled style
c...
Photoplethysmogram (PPG) signal-based blood pressure (BP) estimation is ...
In this paper, we propose a simple but powerful unsupervised learning me...
This paper describes our submission to Task 1 of the Short-duration Spea...
Over the recent years, various deep learning-based embedding methods hav...
For multi-channel speech recognition, speech enhancement techniques such...
Recently, attention-based encoder-decoder (AED) models have shown
state-...
Flow-based generative models are composed of invertible transformations
...
In recent years, various flow-based generative models have been proposed...
Speech is one of the most effective means of communication and is full o...
Modern dialog managers face the challenge of having to fulfill human-lev...
Understanding the intention of an utterance is challenging for some
pros...
Different from the writing systems of many Romance and Germanic language...
Ethics regarding social bias has recently thrown striking issues in natu...
For a large portion of real-life utterances, the intention cannot be sol...
For readability and possibly for disambiguation, appropriate word
segmen...
Intention identification and slot filling is a core issue in dialog
mana...