Recent advancements in speech synthesis have leveraged GAN-based network...
In recent years, large-scale pre-trained speech language models (SLMs) h...
In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that
...
Lifelong audio feature extraction involves learning new sound classes
in...
Auditory attention decoding (AAD) is a technique used to identify and am...
Large-scale pre-trained language models have been shown to be helpful in...
One-shot voice conversion (VC) aims to convert speech from any source sp...
Text-to-Speech (TTS) has recently seen great progress in synthesizing
hi...
We present an unsupervised non-parallel many-to-many voice conversion (V...