Speaker diarization has gained considerable attention within speech
proc...
In this paper, we explored how to boost speech emotion recognition (SER)...
In spite of the excellent strides made by end-to-end (E2E) models in spe...
This paper presents FunCodec, a fundamental neural speech codec toolkit,...
Multi-Modal automatic speech recognition (ASR) techniques aim to leverag...
The exponential growth of data, alongside advancements in model structur...
Hotword customization is one of the important issues remained in ASR fie...
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel vi...
In recent years, the joint training of speech enhancement front-end and
...
The recently proposed serialized output training (SOT) simplifies
multi-...
Recently, speaker-attributed automatic speech recognition (SA-ASR) has
a...
For speech interaction, voice activity detection (VAD) is often used as ...
This paper introduces FunASR, an open-source speech recognition toolkit
...
Estimating confidence scores for recognition results is a classic task i...
Recently, end-to-end neural diarization (EEND) is introduced and achieve...
Conventional ASR systems use frame-level phoneme posterior to conduct
fo...
In this paper, we propose a novel multi-modal multi-task encoder-decoder...
As an important data selection schema, active learning emerges as the
es...
Recently, hybrid systems of clustering and neural diarization models hav...
Transformers have achieved tremendous success in various computer vision...
Speaker-attributed automatic speech recognition (SA-ASR) in multiparty
m...
Recently cross-channel attention, which better leverages multi-channel
s...
Active learning is an important technology for automated machine learnin...
Transformers have recently dominated the ASR field. Although able to yie...
In this paper, we conduct a comparative study on speaker-attributed auto...
Overlapping speech diarization has been traditionally treated as a
multi...
This work presents an extended version of the Vehicle Energy Dataset (VE...
The concept of differential privacy has widely penetrated academia and
i...
Differential privacy (DP) has been the de-facto standard to preserve
pri...
Expressive text-to-speech (TTS) has become a hot research topic recently...
The ICASSP 2022 Multi-channel Multi-party Meeting Transcription Grand
Ch...
Overlapping speech diarization is always treated as a multi-label
classi...
Electrical vehicle (EV) raises to promote an eco-sustainable society.
Ne...
Recent development of speech signal processing, such as speech recogniti...
We propose BeamTransformer, an efficient architecture to leverage
beamfo...
There has been a recent surge of research interest in attacking the prob...
Person re-identification (re-ID) in the scenario with large spatial and
...
Recent works show that mean-teaching is an effective framework for
unsup...
Recently, end-to-end (E2E) speech recognition has become popular, since ...
Transformer is showing its superiority over convolutional architectures ...
Most of unsupervised person Re-Identification (Re-ID) works produce
pseu...
Domain adaptive person Re-Identification (ReID) is challenging owing to ...
Recently, online end-to-end ASR has gained increasing attention. However...
Unsupervised domain adaptive person Re-IDentification (ReID) is challeng...
It is challenging to bridge the performance gap between Binary CNN (BCNN...
Transformer models have been introduced into end-to-end speech recogniti...
Recently, streaming end-to-end automatic speech recognition (E2E-ASR) ha...
End-to-end speech recognition has become popular in recent years, since ...
Various factors like occlusions, backgrounds, etc., would lead to misali...
The challenge of unsupervised person re-identification (ReID) lies in
le...