This paper introduces the HumTrans dataset, which is publicly available ...
Background music (BGM) can enhance the video's emotion. However, selecti...
Text-to-music generation (T2M-Gen) faces a major obstacle due to the sca...
Typically, singing voice conversion (SVC) depends on an embedding vector...
Articulatory features are inherently invariant to acoustic signal distor...
Despite the rapid progress of automatic speech recognition (ASR) technol...
Disordered speech recognition is a highly challenging task. The underlyi...
Automatic recognition of disordered speech remains a highly challenging ...
State-of-the-art automatic speech recognition (ASR) system development i...
Automatic recognition of disordered speech remains a highly challenging ...
State-of-the-art neural language models (LMs) represented by Transformer...
Deep neural networks (DNNs) based automatic speech recognition (ASR) sys...
Automatic recognition of overlapped speech remains a highly challenging ...