Dongchao Yang

research

∙ 09/03/2023

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement

The goal of speech enhancement (SE) is to eliminate the background inter...

0 Wen Wang, et al. ∙

research

∙ 05/30/2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Various applications of voice synthesis have been developed independentl...

0 Rongjie Huang, et al. ∙

research

∙ 05/29/2023

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Large diffusion models have been successful in text-to-audio (T2A) synth...

0 Jiawei Huang, et al. ∙

research

∙ 05/04/2023

HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec

Audio codec models are widely used in audio communication as a crucial t...

0 Dongchao Yang, et al. ∙

research

∙ 04/25/2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Large language models (LLMs) have exhibited remarkable capabilities acro...

7 Rongjie Huang, et al. ∙

research

∙ 03/10/2023

Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss

In text-audio retrieval (TAR) tasks, due to the heterogeneity of content...

0 Yifei Xin, et al. ∙

research

∙ 03/10/2023

Improving Weakly Supervised Sound Event Detection with Causal Intervention

Existing weakly supervised sound event detection (WSSED) work has not ex...

0 Yifei Xin, et al. ∙

research

∙ 01/31/2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

Expressive text-to-speech (TTS) aims to synthesize different speaking st...

0 Dongchao Yang, et al. ∙

research

∙ 01/30/2023

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Large-scale multimodal generative modeling has created milestones in tex...

1 Rongjie Huang, et al. ∙

research

∙ 11/04/2022

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Expressive text-to-speech (TTS) can synthesize a new speaking style by i...

0 Dongchao Yang, et al. ∙

research

∙ 07/20/2022

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Generating sound effects that humans want is an important topic. However...

0 Dongchao Yang, et al. ∙

research

∙ 04/15/2022

Speaker-Aware Mixture of Mixtures Training for Weakly Supervised Speaker Extraction

Dominant researches adopt supervised training for speaker extraction, wh...

0 Zifeng Zhao, et al. ∙

research

∙ 04/05/2022

RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection

Target sound detection (TSD) aims to detect the target sound from a mixt...

0 Dongchao Yang, et al. ∙

research

∙ 04/05/2022

A Two-student Learning Framework for Mixed Supervised Target Sound Detection

Target sound detection (TSD) aims to detect the target sound from mixtur...

0 Dongchao Yang, et al. ∙

research

∙ 04/04/2022

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Recently, end-to-end speaker extraction has attracted increasing attenti...

0 Zifeng Zhao, et al. ∙

research

∙ 04/02/2022

Improving Target Sound Extraction with Timestamp Information

Target sound extraction (TSE) aims to extract the sound part of a target...

0 Helin Wang, et al. ∙

research

∙ 12/19/2021

Detect what you want: Target Sound Detection

Human beings can perceive a target sound that we are interested in from ...

0 Dongchao Yang, et al. ∙

research

∙ 10/12/2021

Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information

Automated audio captioning (AAC) has developed rapidly in recent years, ...

0 Zhongjie Ye, et al. ∙

research

∙ 10/09/2021

A Mutual learning framework for Few-shot Sound Event Detection

Although prototypical network (ProtoNet) has proved to be an effective m...

0 Dongchao Yang, et al. ∙

research

∙ 05/21/2021

Unsupervised Multi-Target Domain Adaptation for Acoustic Scene Classification

It is well known that the mismatch between training (source) and test (t...

0 Dongchao Yang, et al. ∙

research

∙ 10/18/2020

Towards Data Distillation for End-to-end Spoken Conversational Question Answering

In spoken question answering, QA systems are designed to answer question...

0 Chenyu You, et al. ∙

Dongchao Yang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro