Acoustic word embeddings are typically created by training a pooling fun...
In Speech Emotion Recognition (SER), textual data is often used alongsid...
English is the most widely spoken language in the world, used daily by
m...
While modern Text-to-Speech (TTS) systems can produce speech rated highl...
In this work, we seek to build effective code-switched (CS) automatic sp...
In this work, we unify several existing decoding strategies for punctuat...
We present a method for cross-lingual training an ASR system using absol...
We present a structured overview of adaptation algorithms for neural
net...
With 24 official EU and many additional languages, multilingualism in Eu...
Speaker adaptive training (SAT) of neural network acoustic models learns...
Raw waveform acoustic modelling has recently gained interest due to neur...
Acoustic model adaptation to unseen test recordings aims to reduce the
m...
In the broadcast domain there is an abundance of related text data and
p...
Active speaker detection is an important component in video analysis
alg...
The performance of automatic speech recognition systems can be improved ...