Self-supervised learning (SSL) proficiency in speech-related tasks has d...
When labeled data is insufficient, semi-supervised learning with the
pse...
The Streaming Unmixing and Recognition Transducer (SURT) model was propo...
This paper presents a novel algorithm for building an automatic speech
r...
Guided source separation (GSS) is a type of target-speaker extraction me...
Speech data is notoriously difficult to work with due to a variety of co...
This paper introduces GigaSpeech, an evolving, multi-domain English spee...
This paper introduces a new open-source speech corpus named "speechocean...
We introduce asynchronous dynamic decoder, which adopts an efficient A*
...
This paper proposes a parallel computation strategy and a posterior-base...
Environmental noises and reverberation have a detrimental effect on the
...
Several advances have been made recently towards handling overlapping sp...
We present PyChain, a fully parallelized PyTorch implementation of end-t...
Always-on spoken language interfaces, e.g. personal digital assistants, ...
Speaker diarization is an important pre-processing step for many speech
...
Deep neural network based speaker embeddings, such as x-vectors, have be...
We describe initial work on an extension of the Kaldi toolkit that suppo...
Speech recognition systems for irregularly-spelled languages like Englis...
We describe the neural-network training framework used in the Kaldi spee...
In this paper, we propose a second order optimization method to learn mo...