Although recent advancements in diffusion models enabled high-fidelity a...
Self-supervised learning (SSL) for speech representation has been
succes...
End-to-end speech summarization (E2E SSum) directly summarizes input spe...
This paper proposes a novel automatic speech recognition (ASR) system th...
Self-supervised learning (SSL) is the latest breakthrough in speech
proc...
Self-supervised learning (SSL) has been dramatically successful not only...
End-to-end speech summarization (E2E SSum) is a technique to directly
ge...
Siamese-network-based self-supervised learning (SSL) suffers from slow
c...
Self-supervised learning (SSL) is seen as a very promising approach with...
Target speech extraction is a technique to extract the target speaker's ...
This paper presents a novel training method for end-to-end scene text
re...
This paper presents a novel knowledge distillation method for dialogue
s...
We propose a semi-supervised learning method for building end-to-end ric...
We propose a cross-modal transformer-based neural correction models that...
In this paper, we present a novel modeling method for single-channel
mul...
We present a novel personalized voice activity detection (PVAD) learning...
In this paper, we propose a novel spoken-text-style conversion method th...
In the deployment of scene-text spotting systems on mobile platforms,
li...
We present an audio-visual speech separation learning method that consid...
This paper is the first study to apply deep mutual learning (DML) to
end...
This paper presents a novel self-supervised learning method for handling...
We present a novel large-context end-to-end automatic speech recognition...
This paper presents a self-supervised learning method for pointer-genera...
This paper presents a novel fusion method for integrating an external
la...