Non-autoregressive (NAR) modeling has gained significant interest in spe...
Transferring the knowledge of large language models (LLMs) is a promisin...
We improve on the popular conformer architecture by replacing the depthw...
Beam search, which is the dominant ASR decoding algorithm for end-to-end...
Speech transcription, emotion recognition, and language identification a...
We report on aggressive quantization strategies that greatly accelerate
...
Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have ...
We introduce two techniques, length perturbation and n-best based label
...
The lack of speech data annotated with labels required for spoken langua...
Compared to hybrid automatic speech recognition (ASR) systems that use a...
The goal of spoken language understanding (SLU) systems is to determine ...
Large-scale distributed training of deep acoustic models plays an import...
Automatic speech recognition (ASR) is a capability which enables a progr...
We investigate the impact of aggressive low-precision representations of...
When recurrent neural network transducers (RNNTs) are trained using the
...
End-to-end spoken language understanding (SLU) systems that process
huma...
In our previous work we demonstrated that a single headed attention
enco...
We present a comprehensive study on building and adapting RNN transducer...
We investigate a set of techniques for RNN Transducers (RNN-Ts) that wer...
The past decade has witnessed great progress in Automatic Speech Recogni...
Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchr...
It is generally believed that direct sequence-to-sequence (seq2seq) spee...
There has been huge progress in speech recognition over the last several...
Modern Automatic Speech Recognition (ASR) systems rely on distributed de...
With recent advances in deep learning, considerable attention has been g...
In this paper, we propose and investigate a variety of distributed deep
...
Direct acoustics-to-word (A2W) models in the end-to-end paradigm have
re...
An embedding-based speaker adaptive training (SAT) approach is proposed ...
Language models (LMs) based on Long Short Term Memory (LSTM) have shown ...
Recent work on end-to-end automatic speech recognition (ASR) has shown t...
One of the most difficult speech recognition tasks is accurate recogniti...
We describe a collection of acoustic and language modeling techniques th...
Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Ne...