In this work, we propose an error correction framework, named DiaCorrect...
Combining end-to-end neural speaker diarization (EEND) with vector clust...
End-to-end (e2e) systems have recently gained wide popularity in automat...
The recently proposed Joint Energy-based Model (JEM) interprets
discrimi...
End-to-end diarization presents an attractive alternative to standard
ca...
When recognizing emotions from speech, we encounter two common problems:...
Recently, the pre-trained Transformer models have received a rising inte...
In speaker recognition, where speech segments are mapped to embeddings o...
Self-supervised learning of speech representations from large amounts of...
End-to-end neural diarization (EEND) is nowadays one of the most promine...
Dysarthric speech recognition has posed major challenges due to lack of
...
In speaker recognition, where speech segments are mapped to embeddings o...
Motivated by unconsolidated data situation and the lack of a standard
be...
In typical multi-talker speech recognition systems, a neural network-bas...
We propose to express the forward-backward algorithm in terms of operati...
Self-supervised ASR-TTS models suffer in out-of-domain data conditions. ...
Speaker embeddings extracted with deep 2D convolutional neural networks ...
The recently proposed VBx diarization method uses a Bayesian hidden Mark...
We examine the effect of data augmentation for training of language mode...
In this work, we propose a hierarchical subspace model for acoustic unit...
This paper describes the system developed by the BUT team for the fourth...
The paper describes the BUT's speech translation systems. The systems ar...
This paper presents a Bayesian multilingual topic model for learning
lan...
Speaker embeddings (x-vectors) extracted from very short segments of spe...
This document describes task1 of the Short-Duration Speaker Verification...
DeepMine is a speech database in Persian and English designed to build a...
Majority of the text modelling techniques yield only point estimates of
...
In this report, the Brno University of Technology (BUT) team submissions...
This is a description of our effort in VOiCES 2019 Speaker Recognition
c...
In this paper, we present the system description of the joint efforts of...
Sequence-to-sequence ASR models require large quantities of data to atta...
This work tackles the problem of learning a set of language specific aco...
Contrary to i-vectors, speaker embeddings such as x-vectors are incapabl...
In this work, we continue in our research on i-vector extractor for spea...
This paper describes our system submitted to SemEval 2019 Task 7: Rumour...
In this work, we present an analysis of a DNN-based autoencoder for spee...
In this paper, we present promising accurate prefix boosting (PAPB), a
d...
In this paper we investigate the use of adversarial domain adaptation fo...
Recently, speaker embeddings extracted with deep neural networks became ...
In this work we revisit discriminative training of the i-vector extracto...
In this paper, the Brno University of Technology (BUT) team submissions ...
The task of spoken pass-phrase verification is to decide whether a test
...
Training deep recurrent neural network (RNN) architectures is complicate...
The standard state-of-the-art backend for text-independent speaker
recog...
Embeddings in machine learning are low-dimensional representations of co...
Developing speech technologies for low-resource languages has become a v...
Recently several end-to-end speaker verification systems based on deep n...
Acoustic unit discovery (AUD) is a process of automatically identifying ...