Zhaoheng Ni

research

∙ 09/19/2023

FoleyGen: Visually-Guided Audio Generation

Recent advancements in audio generation have been spurred by the evoluti...

0 Xinhao Mei, et al. ∙

research

∙ 09/15/2023

Stack-and-Delay: a new codebook pattern for music generation

In language modeling based music generation, a generated waveform is rep...

0 Gaël Le Lan, et al. ∙

research

∙ 06/11/2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

Self-supervised learning (SSL) has led to great strides in speech proces...

0 William Chen, et al. ∙

research

∙ 05/22/2023

Scaling Speech Technology to 1,000+ Languages

Expanding the language coverage of speech technology has the potential t...

0 Vineel Pratap, et al. ∙

research

∙ 05/15/2023

Ripple sparse self-attention for monaural speech enhancement

The use of Transformer represents a recent success in speech enhancement...

0 Qiquan Zhang, et al. ∙

research

∙ 04/10/2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...

0 Brian Yan, et al. ∙

research

∙ 07/19/2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

This paper presents recent progress on integrating speech separation and...

0 Yen-Ju Lu, et al. ∙

research

∙ 02/24/2022

Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge

This paper describes our submission to the L3DAS22 Challenge Task 1, whi...

0 Yen-Ju Lu, et al. ∙

research

∙ 11/15/2021

Time-Frequency Attention for Monaural Speech Enhancement

Most studies on speech enhancement generally don't consider the energy d...

0 Qiquan Zhang, et al. ∙

research

∙ 10/28/2021

TorchAudio: Building Blocks for Audio and Speech Processing

This document describes version 0.10 of torchaudio: building blocks for ...

0 Yao-Yuan Yang, et al. ∙

research

∙ 12/02/2020

Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement

Recurrent neural networks using the LSTM architecture can achieve signif...

0 Felix Grezes, et al. ∙

research

∙ 12/02/2020

Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks

Spatial clustering techniques can achieve significant multi-channel nois...

0 Zhaoheng Ni, et al. ∙

research

∙ 12/02/2020

Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks

Recent works have shown that Deep Recurrent Neural Networks using the LS...

0 Felix Grezes, et al. ∙

research

∙ 11/03/2019

Onssen: an open-source speech separation and enhancement library

Speech separation is an essential task for multi-talker speech recogniti...

0 Zhaoheng Ni, et al. ∙

Zhaoheng Ni

Featured Co-authors

Sign in with Google

Consider DeepAI Pro