Xixin Wu

research

∙ 09/21/2023

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speake...

0 Shun Lei, et al. ∙

research

∙ 08/31/2023

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) ...

0 Jie Chen, et al. ∙

research

∙ 08/29/2023

Rethinking Machine Ethics – Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

Making moral judgments is an essential step toward developing ethical AI...

0 Jingyan Zhou, et al. ∙

research

∙ 07/29/2023

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

Expressive speech synthesis is crucial for many human-computer interacti...

0 Shun Lei, et al. ∙

research

∙ 05/25/2023

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

Multi-talker overlapped speech poses a significant challenge for speech ...

0 Lingwei Meng, et al. ∙

research

∙ 04/07/2023

Interpretable Unified Language Checking

Despite recent concerns about undesirable behaviors generated by large l...

0 Tianhua Zhang, et al. ∙

research

∙ 03/14/2023

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition

As a common way of emotion signaling via non-linguistic vocalizations, v...

0 Jinchao Li, et al. ∙

research

∙ 03/14/2023

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

With the global population aging rapidly, Alzheimer's disease (AD) is pa...

0 Jinchao Li, et al. ∙

research

∙ 02/20/2023

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

Although automatic speech recognition (ASR) can perform well in common n...

0 Lingwei Meng, et al. ∙

research

∙ 02/02/2023

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

Homophone characters are common in tonal syllable-based languages, such ...

0 Holam Chung, et al. ∙

research

∙ 10/25/2022

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE

We propose an unsupervised learning method to disentangle speech into co...

0 Hui Lu, et al. ∙

research

∙ 06/28/2022

Exploring linguistic feature and model combination for speech recognition based automatic AD detection

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating p...

2 Yi Wang, et al. ∙

research

∙ 03/31/2022

Neural Architecture Search for Speech Emotion Recognition

Deep neural networks have brought significant advancements to speech emo...

0 Xixin Wu, et al. ∙

research

∙ 03/29/2022

Spoofing-Aware Speaker Verification by Multi-Level Fusion

Recently, many novel techniques have been introduced to deal with spoofi...

0 Haibin Wu, et al. ∙

research

∙ 03/08/2022

Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors

Emotion recognition is a key attribute for artificial intelligence syste...

0 Wen Wu, et al. ∙

research

∙ 02/18/2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Dysarthric speech reconstruction (DSR), which aims to improve the qualit...

0 Disong Wang, et al. ∙

research

∙ 02/04/2022

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our speaker diarization system submitted to the Mul...

0 Naijun Zheng, et al. ∙

research

∙ 11/08/2021

Characterizing the adversarial vulnerability of speech self-supervised learning

A leaderboard named Speech processing Universal PERformance Benchmark (S...

0 Haibin Wu, et al. ∙

research

∙ 07/19/2021

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

Existing approaches for anti-spoofing in automatic speaker verification ...

0 Xu Li, et al. ∙

research

∙ 07/07/2021

VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis

This paper describes a variational auto-encoder based non-autoregressive...

0 Hui Lu, et al. ∙

research

∙ 04/02/2021

Attention Forcing for Machine Translation

Auto-regressive sequence-to-sequence models with attention mechanisms ha...

0 Qingyun Dou, et al. ∙

research

∙ 01/13/2021

Should Ensemble Members Be Calibrated?

Underlying the use of statistical approaches for a wide range of applica...

0 Xixin Wu, et al. ∙

research

∙ 09/06/2020

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

This paper proposes an any-to-many location-relative, sequence-to-sequen...

0 Songxiang Liu, et al. ∙

research

∙ 06/11/2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

Recently adversarial attacks on automatic speaker verification (ASV) sys...

0 Xu Li, et al. ∙

research

∙ 04/08/2020

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

Speaker verification systems usually suffer from the mismatch problem be...

0 Xu Li, et al. ∙

research

∙ 02/01/2020

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

Second language (L2) speech is often labeled with the native, phone cate...

0 Xu Li, et al. ∙

research

∙ 11/08/2019

Adversarial Attacks on GMM i-vector based Speaker Verification Systems

This work investigates the vulnerability of Gaussian Mix-ture Model (GMM...

0 Xu Li, et al. ∙

research

∙ 08/30/2019

Maximizing Mutual Information for Tacotron

End-to-end speech synthesis method such as Tacotron, Tacotron2 and Trans...

0 Peng Liu, et al. ∙

Xixin Wu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro