Recent models such as XLS-R and Whisper have made multilingual speech
te...
Marine ecosystems are vital for the planet's health, but human activitie...
Spatio-temporal grounding describes the task of localizing events in spa...
Multilingual text-video retrieval methods have improved significantly in...
Speech transcription, emotion recognition, and language identification a...
Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) ...
Dialog history plays an important role in spoken language understanding ...
The lack of speech data annotated with labels required for spoken langua...
Compared to hybrid automatic speech recognition (ASR) systems that use a...
Intent classifiers are vital to the successful operation of virtual agen...
The goal of spoken language understanding (SLU) systems is to determine ...
Multi-modal learning from video data has seen increased attention recent...
The task of multimodal learning has seen a growing interest recently as ...
In this paper, we explore self-supervised audio-visual models that learn...
End-to-end spoken language understanding (SLU) systems that process
huma...
Multimodal self-supervised learning is getting more and more attention a...
We present a comprehensive study on building and adapting RNN transducer...
A major focus of recent research in spoken language understanding (SLU) ...
We present Calyx, a new intermediate language (IL) for compiling high-le...
Transformer networks and self-supervised pre-training have consistently
...
Training an end-to-end (E2E) neural network speech-to-intent (S2I) syste...
An essential component of spoken language understanding (SLU) is slot
fi...
Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian computatio...
Current methods for learning visually grounded language from videos ofte...
Field-programmable gate arrays (FPGAs) provide an opportunity to co-desi...
With recent advances in deep learning, considerable attention has been g...
We describe an approach for blackbox concurrency based on layering
user-...
Recent work shows unequal performance of commercial face classification
...
We introduce SimplerVoice: a key message and visual description generato...
The performance of automatic speech recognition systems degrades with
in...
This paper describes an audio and textual dataset of debating speeches, ...
One of the most difficult speech recognition tasks is accurate recogniti...
Modern automatic speech recognition (ASR) systems need to be robust unde...