Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speake...
For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) ...
Making moral judgments is an essential step toward developing ethical AI...
Expressive speech synthesis is crucial for many human-computer interacti...
Multi-talker overlapped speech poses a significant challenge for speech
...
Despite recent concerns about undesirable behaviors generated by large
l...
As a common way of emotion signaling via non-linguistic vocalizations, v...
With the global population aging rapidly, Alzheimer's disease (AD) is
pa...
Although automatic speech recognition (ASR) can perform well in common
n...
Homophone characters are common in tonal syllable-based languages, such ...
We propose an unsupervised learning method to disentangle speech into co...
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating
p...
Deep neural networks have brought significant advancements to speech emo...
Recently, many novel techniques have been introduced to deal with spoofi...
Emotion recognition is a key attribute for artificial intelligence syste...
Dysarthric speech reconstruction (DSR), which aims to improve the qualit...
This paper describes our speaker diarization system submitted to the
Mul...
A leaderboard named Speech processing Universal PERformance Benchmark
(S...
Existing approaches for anti-spoofing in automatic speaker verification ...
This paper describes a variational auto-encoder based non-autoregressive...
Auto-regressive sequence-to-sequence models with attention mechanisms ha...
Underlying the use of statistical approaches for a wide range of applica...
This paper proposes an any-to-many location-relative, sequence-to-sequen...
Recently adversarial attacks on automatic speaker verification (ASV) sys...
Speaker verification systems usually suffer from the mismatch problem be...
Second language (L2) speech is often labeled with the native, phone
cate...
This work investigates the vulnerability of Gaussian Mix-ture Model (GMM...
End-to-end speech synthesis method such as Tacotron, Tacotron2 and
Trans...