This report presents the technical details of our submission on the EGO4...
Large-scale, weakly-supervised speech recognition models, such as Whispe...
We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations
cap...
In egocentric videos, actions occur in quick succession. We capitalise o...
The objective of this work is speaker diarisation of speech recordings '...
This report describes our submission to the VoxCeleb Speaker Recognition...
The goal of this work is to train robust speaker recognition models with...
The goal of this paper is speaker diarisation of videos collected 'in th...
The goal of this work is to train effective representations for keyword
...
The objective of this paper is 'open-set' speaker recognition of unseen
...
Musical onset detection can be formulated as a time-to-event (TTE) or
ti...
Research on content and style representations has been widely studied in...
Research in speaker recognition has recently seen significant progress d...
Most deep learning-based models for speech enhancement have mainly focus...