Daniel Povey

research

∙ 09/14/2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Self-supervised learning (SSL) proficiency in speech-related tasks has d...

0 Yifan Yang, et al. ∙

research

∙ 08/12/2023

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

When labeled data is insufficient, semi-supervised learning with the pse...

0 Han Zhu, et al. ∙

research

∙ 06/18/2023

SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition

The Streaming Unmixing and Recognition Transducer (SURT) model was propo...

0 Desh Raj, et al. ∙

research

∙ 06/01/2023

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

This paper presents a novel algorithm for building an automatic speech r...

0 Dongji Gao, et al. ∙

research

∙ 12/10/2022

GPU-accelerated Guided Source Separation for Meeting Transcription

Guided source separation (GSS) is a type of target-speaker extraction me...

0 Desh Raj, et al. ∙

research

∙ 10/25/2021

Lhotse: a speech data representation library for the modern deep learning ecosystem

Speech data is notoriously difficult to work with due to a variety of co...

0 Piotr Żelasko, et al. ∙

research

∙ 06/13/2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

This paper introduces GigaSpeech, an evolving, multi-domain English spee...

0 Guoguo Chen, et al. ∙

research

∙ 04/03/2021

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

This paper introduces a new open-source speech corpus named "speechocean...

0 Junbo Zhang, et al. ∙

research

∙ 03/16/2021

An Asynchronous WFST-Based Decoder For Automatic Speech Recognition

We introduce asynchronous dynamic decoder, which adopts an efficient A* ...

0 Hang Lv, et al. ∙

research

∙ 03/08/2021

A Parallelizable Lattice Rescoring Strategy with Neural Language Models

This paper proposes a parallel computation strategy and a posterior-base...

0 Ke Li, et al. ∙

research

∙ 11/04/2020

Frustratingly Easy Noise-aware Training of Acoustic Models

Environmental noises and reverberation have a detrimental effect on the ...

0 Desh Raj, et al. ∙

research

∙ 11/03/2020

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Several advances have been made recently towards handling overlapping sp...

0 Desh Raj, et al. ∙

research

∙ 05/20/2020

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

We present PyChain, a fully parallelized PyTorch implementation of end-t...

0 Yiwen Shao, et al. ∙

research

∙ 05/17/2020

Wake Word Detection with Alignment-Free Lattice-Free MMI

Always-on spoken language interfaces, e.g. personal digital assistants, ...

0 Yiming Wang, et al. ∙

research

∙ 02/14/2020

Speaker Diarization with Region Proposal Network

Speaker diarization is an important pre-processing step for many speech ...

0 Zili Huang, et al. ∙

research

∙ 09/13/2019

Probing the Information Encoded in x-vectors

Deep neural network based speaker embeddings, such as x-vectors, have be...

0 Desh Raj, et al. ∙

research

∙ 04/09/2018

A GPU-based WFST Decoder with Exact Lattice Generation

We describe initial work on an extension of the Kaldi toolkit that suppo...

0 Zhehuai Chen, et al. ∙

research

∙ 06/12/2017

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

Speech recognition systems for irregularly-spelled languages like Englis...

0 Xiaohui Zhang, et al. ∙

research

∙ 10/27/2014

Parallel training of DNNs with Natural Gradient and Parameter Averaging

We describe the neural-network training framework used in the Kaldi spee...

0 Daniel Povey, et al. ∙

research

∙ 11/18/2011

Krylov Subspace Descent for Deep Learning

In this paper, we propose a second order optimization method to learn mo...

0 Oriol Vinyals, et al. ∙

Daniel Povey

Featured Co-authors

Sign in with Google

Consider DeepAI Pro