This paper addresses the tradeoff between standard accuracy on clean exa...
This paper proposes a novel automatic speech recognition (ASR) system th...
Self-supervised learning (SSL) is the latest breakthrough in speech
proc...
End-to-end speech summarization (E2E SSum) is a technique to directly
ge...
This paper investigates the effectiveness and implementation of
modality...
In this paper, we investigate the semi-supervised joint training of text...
Target speech extraction is a technique to extract the target speaker's ...
There have been many attempts to build multimodal dialog systems that ca...
This paper presents a novel training method for end-to-end scene text
re...
This paper presents a novel knowledge distillation method for dialogue
s...
We propose a semi-supervised learning method for building end-to-end ric...
We propose a cross-modal transformer-based neural correction models that...
In this paper, we present a novel modeling method for single-channel
mul...
We present a novel personalized voice activity detection (PVAD) learning...
In this paper, we propose a novel spoken-text-style conversion method th...
We present an audio-visual speech separation learning method that consid...
This paper is the first study to apply deep mutual learning (DML) to
end...
This paper presents a novel self-supervised learning method for handling...
We present a novel large-context end-to-end automatic speech recognition...
This paper presents a self-supervised learning method for pointer-genera...
This paper presents a novel fusion method for integrating an external
la...
One of the problems with automated audio captioning (AAC) is the
indeter...