Spoken semantic parsing (SSP) involves generating machine-comprehensible...
End-to-end (E2E) spoken language understanding (SLU) systems that genera...
Recent studies find existing self-supervised speech encoders contain
pri...
Recently, there has been an increasing interest in two-pass streaming
en...
We propose a novel deliberation-based approach to end-to-end (E2E) spoke...
Measuring automatic speech recognition (ASR) system quality is critical ...
How to leverage dynamic contextual information in end-to-end speech
reco...
Word Error Rate (WER) has been the predominant metric used to evaluate t...
End-to-end automatic speech recognition (ASR) models with a single neura...
Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech...
We present an end-to-end speech recognition model that learns interactio...
We present a novel conversational-context aware end-to-end speech recogn...
Conversational context information, higher-level knowledge that spans ac...
Existing speech recognition systems are typically built at the sentence
...
Achieving high accuracy with end-to-end speech recognizers requires care...
Building speech recognizers in multiple languages typically involves
rep...
We propose a novel deep neural network architecture for speech recogniti...
Integration of multiple microphone data is one of the key ways to achiev...
We propose a transfer deep learning (TDL) framework that can transfer th...