Speech to text models tend to be trained and evaluated against a single
...
Inverse text normalization (ITN) is used to convert the spoken form outp...
It is well known that many machine learning systems demonstrate bias tow...
This paper presents XLS-R, a large-scale model for cross-lingual speech
...
With 4.5 million hours of English speech from 10 different sources acros...
Representation learning from unlabeled data has been of major interest i...
Hybrid automatic speech recognition (ASR) models are typically sequentia...
Language identification greatly impacts the success of downstream tasks ...
In this paper, we introduce the Kaizen framework that uses a continuousl...
How to leverage dynamic contextual information in end-to-end speech
reco...
Although speaker verification has conventionally been an audio-only task...
In this work, to measure the accuracy and efficiency for a latency-contr...
In this work, we exploit speech enhancement for improving a recurrent ne...
End-to-end automatic speech recognition (ASR) models with a single neura...
End-to-end (E2E) systems for automatic speech recognition (ASR), such as...
In this work, we first show that on the widely used LibriSpeech benchmar...
Many semi- and weakly-supervised approaches have been investigated for
o...
Videos uploaded on social media are often accompanied with textual
descr...
Supervised ASR models have reached unprecedented levels of accuracy, tha...
Towards developing high-performing ASR for low-resource languages, appro...