Attention-based encoder-decoder (AED) speech recognition model has been
...
Rehearsal-based approaches are a mainstay of continual learning (CL). Th...
Numerous adversarial attack methods have been developed to generate
impe...
We propose gated language experts to improve multilingual transformer
tr...
During the deployment of deep neural networks (DNNs) on edge devices, ma...
As data become increasingly vital for deep learning, a company would be ...
Automatic Speech Recognition (ASR) systems typically yield output in lex...
It has been well recognized that neural network based image classifiers ...
Streaming end-to-end multi-talker speech recognition aims at transcribin...
Hybrid and end-to-end (E2E) systems have their individual advantages, wi...
Text-only adaptation of an end-to-end (E2E) model remains a challenging ...
Speakers may move around while diarisation is being performed. When a
mi...
Previous works have shown that spatial location information can be
compl...
Integrating external language models (LMs) into end-to-end (E2E) models
...
In this paper, several works are proposed to address practical challenge...
In multi-talker scenarios such as meetings and conversations, speech
pro...
The efficacy of external language model (LM) integration with existing
e...
End-to-end multi-talker speech recognition is an emerging research trend...
The external language models (LM) integration remains a challenging task...
Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end...
This paper describes the Microsoft speaker diarization system for monaur...
We propose speaker separation using speaker inventories and estimated sp...
Because of its streaming nature, recurrent neural network transducer (RN...
While recurrent neural networks still largely define state-of-the-art sp...
Recently, the recurrent neural network transducer (RNN-T) architecture h...
We propose a novel neural label embedding (NLE) scheme for the domain
ad...
Recently, a few novel streaming attention-based sequence-to-sequence (S2...
While the community keeps promoting end-to-end models over conventional
...
To facilitate the deployment of deep neural networks (DNNs) on
resource-...
Recurrent neural networks (RNNs) based automatic speech recognition has
...
Structured weight pruning is a representative model compression techniqu...
Accelerating DNN execution on various resource-limited computing platfor...
Teacher-student (T/S) has shown to be effective for domain adaptation of...
Predicting words and subword units (WSUs) as the output has shown to be
...
This paper describes a system that generates speaker-annotated transcrip...
We propose three regularization-based speaker adaptation approaches to a...
In the last few years, an emerging trend in automatic speech recognition...
We propose self-teaching networks to improve the generalization capacity...
We introduce PyKaldi2 speech recognition toolkit implemented based on Ka...
The cloud-based speech recognition/API provides developers or enterprise...
We propose a novel adversarial speaker adaptation (ASA) scheme, in which...
The use of deep networks to extract embeddings for speaker recognition h...
Adversarial domain-invariant training (ADIT) proves to be effective in
s...
The teacher-student (T/S) learning has been shown to be effective for a
...
We propose two approaches for speaker adaptation in end-to-end (E2E)
aut...
The acoustic-to-word model based on the Connectionist Temporal Classific...
Feature mapping using deep neural networks is an effective approach for
...
Feature-mapping with deep neural networks is commonly used for single-ch...
It is popular to stack LSTM layers to get better modeling power, especia...
In this study, we develop the keyword spotting (KWS) and acoustic model ...