We present a novel integration of an instruction-tuned large language mo...
Achieving high accuracy with low latency has always been a challenge in
...
Collecting sufficient labeled data for spoken language understanding (SL...
This paper presents InterMPL, a semi-supervised learning method of end-t...
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic sp...
This paper presents BERT-CTC, a novel formulation of end-to-end speech
r...
Connectionist Temporal Classification (CTC) is a widely used approach fo...
While Transformers have achieved promising results in end-to-end (E2E)
a...
In the present paper, an attempt is made to combine Mask-CTC and the
tri...
Non-autoregressive (NAR) models simultaneously generate multiple outputs...
Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a se...
In end-to-end automatic speech recognition (ASR), a model is expected to...
This article describes an efficient end-to-end speech translation (E2E-S...
Pseudo-labeling (PL) has been shown to be effective in semi-supervised
a...
This paper describes the recent development of ESPnet
(https://github.co...
In this study, we present recent developments on ESPnet: End-to-End Spee...
For real-world deployment of automatic speech recognition (ASR), the sys...
Fast inference speed is an important goal towards real-world deployment ...
We present Mask CTC, a novel non-autoregressive end-to-end automatic spe...