Dan Su

research

∙ 09/04/2023

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Mapping two modalities, speech and text, into a shared representation sp...

0 Jiaxu Zhu, et al. ∙

research

∙ 05/20/2023

Model Debiasing via Gradient-based Explanation on Representation

Machine learning systems produce biased results towards certain demograp...

0 Jindi Zhang, et al. ∙

research

∙ 04/21/2023

Learn What NOT to Learn: Towards Generative Safety in Chatbots

Conversational models that are generative and open-domain are particular...

0 Leila Khalatbari, et al. ∙

research

∙ 02/08/2023

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

This paper proposes a framework for quantitatively evaluating interactiv...

13 Yejin Bang, et al. ∙

research

∙ 12/19/2022

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

We present NusaCrowd, a collaborative initiative to collect and unite ex...

0 Samuel Cahyawijaya, et al. ∙

research

∙ 12/03/2022

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating...

0 Yi Lei, et al. ∙

research

∙ 11/15/2022

Generative Long-form Question Answering: Relevance, Faithfulness and Succinctness

In this thesis, we investigated the relevance, faithfulness, and succinc...

0 Dan Su, et al. ∙

research

∙ 10/14/2022

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

Large-scale vision-language pre-trained (VLP) models are prone to halluc...

0 Wenliang Dai, et al. ∙

research

∙ 10/12/2022

Context Generation Improves Open Domain Question Answering

Closed-book question answering (QA) requires a model to directly answer ...

12 Dan Su, et al. ∙

research

∙ 10/11/2022

The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022

This paper is the system description of the DKU-Tencent System for the V...

0 Xiaoyi Qin, et al. ∙

research

∙ 07/13/2022

Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings

Automatic speaker verification has achieved remarkable progress in recen...

0 Xiaoyi Qin, et al. ∙

research

∙ 07/05/2022

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

The zero-shot scenario for speech generation aims at synthesizing a nove...

0 Yi Lei, et al. ∙

research

∙ 07/02/2022

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Building a voice conversion system for noisy target speakers, such as us...

0 Liumeng Xue, et al. ∙

research

∙ 06/15/2022

End-to-End Voice Conversion with Information Perturbation

The ideal goal of voice conversion is to convert the source speaker's sp...

0 Qicong Xie, et al. ∙

research

∙ 06/01/2022

AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation

Speaker adaptation in text-to-speech synthesis (TTS) is to finetune a pr...

0 Kun Song, et al. ∙

research

∙ 05/12/2022

AiSocrates: Towards Answering Ethical Quandary Questions

Considerable advancements have been made in various NLP tasks based on t...

12 Yejin Bang, et al. ∙

research

∙ 04/07/2022

3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition

Recently, Conformer based CTC/AED model has become a mainstream architec...

0 Zhao You, et al. ∙

research

∙ 02/18/2022

VCVTS: Multi-speaker Video-to-Speech synthesis via cross-modal knowledge transfer from voice conversion

Though significant progress has been made for speaker-dependent Video-to...

0 Disong Wang, et al. ∙

research

∙ 02/14/2022

QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation

Multi-hop question generation (MQG) aims to generate complex questions w...

0 Dan Su, et al. ∙

research

∙ 02/04/2022

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

This paper describes our speaker diarization system submitted to the Mul...

0 Naijun Zheng, et al. ∙

research

∙ 01/28/2022

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Denoising diffusion probabilistic models (DDPMs) are expressive generati...

0 Songxiang Liu, et al. ∙

research

∙ 12/05/2021

Consistent Training and Decoding For End-to-end Speech Recognition Using Lattice-free MMI

Recently, End-to-End (E2E) frameworks have achieved remarkable results o...

0 Jinchuan Tian, et al. ∙

research

∙ 11/23/2021

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Mixture-of-experts based acoustic models with dynamic routing mechanisms...

0 Zhao You, et al. ∙

research

∙ 11/14/2021

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

The task of few-shot style transfer for voice cloning in text-to-speech ...

0 Songxiang Liu, et al. ∙

research

∙ 10/13/2021

Simple Attention Module based Speaker Verification with Iterative noisy label detection

Recently, the attention mechanism such as squeeze-and-excitation module ...

0 Xiaoyi Qin, et al. ∙

research

∙ 09/08/2021

AppQ: Warm-starting App Recommendation Based on View Graphs

Current app ranking and recommendation systems are mainly based on user-...

12 Dan Su, et al. ∙

research

∙ 09/08/2021

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

Cross-speaker style transfer (CSST) in text-to-speech (TTS) synthesis ai...

0 Songxiang Liu, et al. ∙

research

∙ 06/21/2021

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

Current two-stage TTS framework typically integrates an acoustic model w...

0 Jian Cong, et al. ∙

research

∙ 06/21/2021

Controllable Context-aware Conversational Speech Synthesis

In spoken conversations, spontaneous behaviors like filled pause and pro...

0 Jian Cong, et al. ∙

research

∙ 06/13/2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

This paper introduces GigaSpeech, an evolving, multi-domain English spee...

0 Guoguo Chen, et al. ∙

research

∙ 06/11/2021

Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis

For conversational text-to-speech (TTS) systems, it is vital that the sy...

0 Jingbei Li, et al. ∙

research

∙ 05/28/2021

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

Singing voice conversion (SVC) is one promising technique which can enri...

0 Songxiang Liu, et al. ∙

research

∙ 05/27/2021

Improve Query Focused Abstractive Summarization by Incorporating Answer Relevance

Query focused summarization (QFS) models aim to generate summaries from ...

7 Dan Su, et al. ∙

research

∙ 05/13/2021

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters

To diversify and enrich generated dialogue responses, knowledge-grounded...

7 Yan Xu, et al. ∙

research

∙ 05/08/2021

Latency-Controlled Neural Architecture Search for Streaming Speech Recognition

Recently, neural architecture search (NAS) has attracted much attention ...

0 Liqiang He, et al. ∙

research

∙ 05/07/2021

SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts

Recently, Mixture of Experts (MoE) based Transformer has shown promising...

0 Zhao You, et al. ∙

research

∙ 02/12/2021

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention

This paper proposes VARA-TTS, a non-autoregressive (non-AR) text-to-spee...

4 Peng Liu, et al. ∙

research

∙ 10/28/2020

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

Non-autoregressive (NAR) transformer models have achieved significantly ...

0 Xingchen Song, et al. ∙

research

∙ 10/28/2020

Replay and Synthetic Speech Detection with Res2net Architecture

Existing approaches for replay and synthetic speech detection still lack...

0 Xu Li, et al. ∙

research

∙ 10/21/2020

"Are you home alone?" "Yes" Disclosing Security and Privacy Vulnerabilities in Alexa Skills

The home voice assistants such as Amazon Alexa have become increasingly ...

0 Dan Su, et al. ∙

research

∙ 10/19/2020

Dimsum @LaySumm 20: BART-based Approach for Scientific Document Summarization

Lay summarization aims to generate lay summaries of scientific papers au...

0 Tiezheng Yu, et al. ∙

research

∙ 10/19/2020

Multi-hop Question Generation with Graph Convolutional Network

Multi-hop Question Generation (QG) aims to generate answer-related quest...

10 Dan Su, et al. ∙

research

∙ 08/25/2020

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

In this paper, we explore the neural architecture search (NAS) for autom...

0 Liqiang He, et al. ∙

research

∙ 06/20/2020

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

Generating 3D speech-driven talking head has received more and more atte...

0 Huirong Huang, et al. ∙

research

∙ 06/11/2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

Recently adversarial attacks on automatic speaker verification (ASV) sys...

0 Xu Li, et al. ∙

research

∙ 05/04/2020

CAiRE-COVID: A Question Answering and Multi-Document Summarization System for COVID-19 Research

To address the need for refined information in COVID-19 pandemic, we pro...

0 Dan Su, et al. ∙

research

∙ 03/09/2020

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Hand-crafted spatial features (e.g., inter-channel phase difference, IPD...

0 Rongzhi Gu, et al. ∙

research

∙ 10/28/2019

DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

Self-attention networks (SAN) have been introduced into automatic speech...

0 Zhao You, et al. ∙

research

∙ 10/28/2019

Mixup-breakdown: a consistency training method for improving generalization of speech separation models

Deep-learning based speech separation models confront poor generalizatio...

0 Max W. Y. Lam, et al. ∙

research

∙ 10/23/2019

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks

Self-attention network (SAN) can benefit significantly from the bi-direc...

0 Xingchen Song, et al. ∙

Dan Su

Featured Co-authors

Sign in with Google

Consider DeepAI Pro