Jianwu Dang

research

∙ 09/01/2023

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

For fine-grained generation and recognition tasks such as minimally-supe...

0 Chunyu Qiang, et al. ∙

research

∙ 07/28/2023

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

Recently, there has been a growing interest in text-to-speech (TTS) meth...

0 Chunyu Qiang, et al. ∙

research

∙ 06/05/2023

Rethinking the visual cues in audio-visual speaker extraction

The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel vi...

0 Junjie Li, et al. ∙

research

∙ 05/29/2023

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

In recent years, the joint training of speech enhancement front-end and ...

0 Haoyu Lu, et al. ∙

research

∙ 03/26/2023

Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

Time-domain speech enhancement (SE) has recently been intensively invest...

0 Hao Shi, et al. ∙

research

∙ 12/07/2022

MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

Recently, many deep learning based beamformers have been proposed for mu...

0 Yanjie Fu, et al. ∙

research

∙ 11/02/2022

Monolingual Recognizers Fusion for Code-switching Speech Recognition

The bi-encoder structure has been intensively investigated in code-switc...

0 Tongtong Song, et al. ∙

research

∙ 10/09/2022

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

Speaker extraction seeks to extract the target speech in a multi-talker ...

0 Junjie Li, et al. ∙

research

∙ 07/15/2022

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

Recent neural network based Direction of Arrival (DoA) estimation algori...

0 Haoran Yin, et al. ∙

research

∙ 06/29/2022

Language-specific Characteristic Assistance for Code-switching Speech Recognition

Dual-encoder structure successfully utilizes two language-specific encod...

0 Tongtong Song, et al. ∙

research

∙ 06/24/2022

Iterative Sound Source Localization for Unknown Number of Sources

Sound source localization aims to seek the direction of arrival (DOA) of...

1 Yanjie Fu, et al. ∙

research

∙ 04/30/2022

Heterogeneous Graph Neural Networks using Self-supervised Reciprocally Contrastive Learning

Heterogeneous graph neural network (HGNN) is a very popular technique fo...

12 Di Jin, et al. ∙

research

∙ 03/17/2022

TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding

Speaker embedding is an important front-end module to explore discrimina...

0 Ruiteng Zhang, et al. ∙

research

∙ 02/21/2022

L-SpEx: Localized Target Speaker Extraction

Speaker extraction aims to extract the target speaker's voice from a mul...

0 Meng Ge, et al. ∙

research

∙ 10/09/2021

Using multiple reference audios and style embedding constraints for speech synthesis

The end-to-end speech synthesis model can directly take an utterance as ...

0 Cheng Gong, et al. ∙

research

∙ 11/19/2020

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

Speaker extraction uses a pre-recorded reference speech as the reference...

0 Meng Ge, et al. ∙

research

∙ 05/10/2020

SpEx+: A Complete Time Domain Speaker Extraction Network

Speaker extraction aims to extract the target speech signal from a multi...

0 Meng Ge, et al. ∙

research

∙ 05/05/2020

Constructing Accurate and Efficient Deep Spiking Neural Networks with Double-threshold and Augmented Schemes

Spiking neural networks (SNNs) are considered as a potential candidate t...

3 Qiang Yu, et al. ∙

research

∙ 05/02/2020

Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning

Spikes are the currency in central nervous systems for information trans...

7 Qiang Yu, et al. ∙

research

∙ 10/23/2019

Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection

Most existing AU detection works considering AU relationships are relyin...

0 Zhilei Liu, et al. ∙

research

∙ 02/04/2019

Robust Environmental Sound Recognition with Sparse Key-point Encoding and Efficient Multi-spike Learning

The capability for environmental sound recognition (ESR) can determine t...

0 Qiang Yu, et al. ∙

research

∙ 03/21/2018

Speech Emotion Recognition Considering Local Dynamic Features

Recently, increasing attention has been directed to the study of the spe...

0 Haotian Guan, et al. ∙

Jianwu Dang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro