Humans often speak in a continuous manner which leads to coherent and
co...
This paper proposes an expressive singing voice synthesis system by
intr...
Disentanglement of a speaker's timbre and style is very important for st...
We demonstrate ViDA-MAN, a digital-human agent for multi-modal interacti...
End-to-end Automatic Speech Recognition (ASR) models are usually trained...
Despite prosody is related to the linguistic information up to the disco...
This paper presents a novel reranking model, future reward reranking, to...