Cross-modal retrieval (CMR) has been extensively applied in various doma...
Most existing sandstorm image enhancement methods are based on tradition...
This paper integrates graph-to-sequence into an end-to-end text-to-speec...
Generating realistic talking faces is a complex and widely discussed tas...
The rise of the phenomenon of the "right to be forgotten" has prompted
r...
Voice conversion is a method that allows for the transformation of speak...
Music Emotion Recognition involves the automatic identification of emoti...
In the realm of Large Language Models, the balance between instruction d...
Voice conversion as the style transfer task applied to speech, refers to...
Chinese Automatic Speech Recognition (ASR) error correction presents
sig...
Conversational Question Answering (CQA) is a challenging task that aims ...
Federated Learning (FL) has been widely concerned for it enables
decentr...
There has been significant progress in emotional Text-To-Speech (TTS)
sy...
In recent Text-to-Speech (TTS) systems, a neural vocoder often generates...
Text summarization is essential for information aggregation and demands ...
Out-of-distribution (OOD) detection aims at enhancing standard deep neur...
Deep neural retrieval models have amply demonstrated their power but
est...
Deep neural networks have achieved remarkable performance in retrieval-b...
Because of predicting all the target tokens in parallel, the
non-autoreg...
Recent expressive text to speech (TTS) models focus on synthesizing emot...
Music genre classification has been widely studied in past few years for...
Data-Free Knowledge Distillation (DFKD) has recently attracted growing
a...
The recent emergence of joint CTC-Attention model shows significant
impr...
Recent advances in pre-trained language models have improved the perform...
Most previous neural text-to-speech (TTS) methods are mainly based on
su...
Metaverse expands the physical world to a new dimension, and the physica...
Recovering the masked speech frames is widely applied in speech
represen...
In this paper, we proposed Adapitch, a multi-speaker TTS method that mak...
Estimating age from a single speech is a classic and challenging topic.
...
Unsupervised representation learning for speech audios attained impressi...
Since the beginning of the COVID-19 pandemic, remote conferencing and
sc...
Pose Guided Human Image Synthesis (PGHIS) is a challenging task of
trans...
Machine learning models (mainly neural networks) are used more and more ...
The extraction of sequence patterns from a collection of functionally li...
The Transformer architecture model, based on self-attention and multi-he...
Buddhism is an influential religion with a long-standing history and pro...
Nonparallel multi-domain voice conversion methods such as the StarGAN-VC...
One-shot voice conversion (VC) with only a single target speaker's speec...
Non-parallel many-to-many voice conversion remains an interesting but
ch...
Time-domain Transformer neural networks have proven their superiority in...
Speech emotion recognition (SER) has many challenges, but one of the mai...
Although deep Neural Networks (DNNs) have achieved tremendous success in...
Currently, the federated graph neural network (GNN) has attracted a lot ...
Facial micro-expressions recognition has attracted much attention recent...
In this paper, we investigated a speech augmentation based unsupervised
...
Low resource automatic speech recognition (ASR) is a useful but thorny t...
Federated learning (FL) is a paradigm where many clients collaboratively...
Deep learning models have made significant progress in automatic program...
Non-negative matrix factorization (NMF) based topic modeling is widely u...
Pre-trained BERT models have achieved impressive performance in many nat...