Tara Sainath

research

∙ 09/15/2023

Augmenting conformers with structured state space models for online speech recognition

Online speech recognition, where the model only accesses context to the ...

0 Haozhe Shan, et al. ∙

research

∙ 06/22/2023

AudioPaLM: A Large Language Model That Can Speak and Listen

We introduce AudioPaLM, a large language model for speech understanding ...

0 Paul K. Rubenstein, et al. ∙

research

∙ 04/19/2023

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale

Unpaired text and audio injection have emerged as dominant methods for i...

3 Cal Peyser, et al. ∙

research

∙ 03/02/2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

We introduce the Universal Speech Model (USM), a single large model that...

0 Yu Zhang, et al. ∙

research

∙ 01/11/2023

Dual Learning for Large Vocabulary On-Device ASR

Dual learning is a paradigm for semi-supervised machine learning that se...

1 Cal Peyser, et al. ∙

research

∙ 11/01/2022

Unified End-to-End Speech Recognition and Endpointing for Fast and Efficient Speech Systems

Automatic speech recognition (ASR) systems typically rely on an external...

0 Shaan Bijwadia, et al. ∙

research

∙ 09/13/2022

Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification

Language identification is critical for many downstream tasks in automat...

0 Chao Zhang, et al. ∙

research

∙ 02/18/2021

Echo State Speech Recognition

We propose automatic speech recognition (ASR) models inspired by echo st...

11 Harsh Shrivastava, et al. ∙

research

∙ 11/06/2019

A comparison of end-to-end models for long-form speech recognition

End-to-end automatic speech recognition (ASR) models, including both att...

0 Chung-Cheng Chiu, et al. ∙

research

∙ 04/30/2019

Deep Learning for Audio Signal Processing

Given the recent surge in developments of deep learning, this article pr...

0 Hendrik Purwins, et al. ∙

research

∙ 02/21/2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collab...

13 Jonathan Shen, et al. ∙

research

∙ 11/22/2018

Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

We present two end-to-end models: Audio-to-Byte (A2B) and Byte-to-Audio ...

0 Bo Li, et al. ∙

Tara Sainath

Featured Co-authors

Sign in with Google

Consider DeepAI Pro