Zero-shot text-to-speech aims at synthesizing voices with unseen speech
...
Direct speech-to-speech translation (S2ST) aims to convert speech from o...
Text-to-speech(TTS) has undergone remarkable improvements in performance...
Speech-to-SQL (S2SQL) aims to convert spoken questions into SQL queries ...
Improving text representation has attracted much attention to achieve
ex...
We are interested in a challenging task, Realistic-Music-Score based Sin...
Generating talking person portraits with arbitrary speech audio is a cru...
Generating photo-realistic video portrait with arbitrary speech audio is...
Direct speech-to-speech translation (S2ST) systems leverage recent progr...
In pop music, accompaniments are usually played by multiple instruments
...