Transformer-based speech recognition (ASR) model with deep layers exhibi...
Whisper is a powerful automatic speech recognition (ASR) model. Neverthe...
Recent studies on pronunciation scoring have explored the effect of
intr...
Very deep models for speaker recognition (SR) have demonstrated remarkab...
Probabilistic linear discriminant analysis (PLDA) is commonly used in sp...
DNN-based models achieve high performance in the speaker verification (S...
Deep convolutional neural networks (CNNs) have been applied to extractin...
The capability of generating speech with specific type of emotion is des...
Online exams via video conference software like Zoom have been adopted i...
Pooling is needed to aggregate frame-level features into utterance-level...
The performance of child speech recognition is generally less satisfacto...
State-of-art speaker verification (SV) systems use a back-end model to s...
This study extends our previous work on text-based speech editing to
dev...
This paper presents a macroscopic approach to automatic detection of spe...
Human speech production encompasses physiological processes that natural...
Psychoacoustic studies have shown that locally-time reversed (LTR) speec...
This study aims at designing an environment-aware text-to-speech (TTS) s...
In the development of neural text-to-speech systems, model pre-training ...
The paper presents a novel approach to refining similarity scores betwee...
Confidence measure is a performance index of particular importance for
a...
Acoustic scene classification (ASC) aims to identify the type of scene
(...
This paper describes a novel design of a neural network-based speech
gen...
This paper presents the design, implementation and evaluation of a speec...
Speech sound disorder (SSD) refers to a type of developmental disorder i...
Despite the widespread utilization of deep neural networks (DNNs) for sp...
This paper presents the CUHK-EE voice cloning system for ICASSP 2021 M2V...
A key task for speech recognition systems is to reduce the mismatch betw...
Spoken term discovery from untranscribed speech audio could be achieved ...
This technical report describes our submission to the 2021 SLT Children
...
The present study tackles the problem of automatically discovering spoke...
Categorical speech emotion recognition is typically performed as a
seque...
Human emotional speech is, by its very nature, a variant signal. This re...
Human emotions are inherently ambiguous and impure. When designing syste...
Speech sound disorder (SSD) refers to the developmental disorder in whic...
This paper describes the design and development of CUCHILD, a large-scal...
Speech signal is constituted and contributed by various informative fact...
This research addresses the problem of acoustic modeling of low-resource...
This study tackles unsupervised subword modeling in the zero-resource
sc...
This study addresses the problem of unsupervised subword unit discovery ...
Acoustic scene classification is the task of identifying the scene from ...
Audio classification is the task of identifying the sound categories tha...