While FastSpeech2 aims to integrate aspects of speech such as pitch, ene...
Building a multilingual Automated Speech Recognition (ASR) system in a
l...
In this paper, we introduce UnFuSeD, a novel approach to leverage
self-s...
This paper proposes a novel technique to obtain better downstream ASR
pe...
We present a new Self-Supervised Learning (SSL) approach to pre-train
en...
We present Multiscale Audio Spectrogram Transformer (MAST) for audio
cla...
In this paper, we propose a new Self-Supervised Learning (SSL) algorithm...
While Self-Supervised Learning has helped reap the benefit of the scale ...
Self-supervised learning (SSL) based models have been shown to generate
...
Self-supervised learning (SSL) to learn high-level speech representation...
While self-supervised speech representation learning (SSL) models serve ...
The expression of emotions is a crucial part of daily human communicatio...
Emotion Recognition (ER) aims to classify human utterances into differen...
Existing approaches in disfluency detection focus on solving a token-lev...
Inspired by the recent progress in self-supervised learning for computer...
In this paper, we investigate domain adaptation for low-resource Automat...
We introduce DECAR, a self-supervised pre-training approach for learning...
X-vectors have become the standard for speaker-embeddings in automatic
s...
End-to-end models are fast replacing conventional hybrid models in autom...
In this paper, a modification to the training process of the popular SPL...