Samuel Thomas

research

∙ 05/21/2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Recent models such as XLS-R and Whisper have made multilingual speech te...

0 Andrew Rouditchenko, et al. ∙

research

∙ 04/04/2023

FisHook – An Optimized Approach to Marine Specie Classification using MobileNetV2

Marine ecosystems are vital for the planet's health, but human activitie...

0 Kohav Dey, et al. ∙

research

∙ 03/29/2023

What, when, and where? – Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Spatio-temporal grounding describes the task of localizing events in spa...

0 Brian Chen, et al. ∙

research

∙ 10/07/2022

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Multilingual text-video retrieval methods have improved significantly in...

0 Andrew Rouditchenko, et al. ∙

research

∙ 07/28/2022

Extending RNN-T-based speech recognition systems with emotion and language classification

Speech transcription, emotion recognition, and language identification a...

0 Zvi Kons, et al. ∙

research

∙ 04/11/2022

Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems

Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) ...

0 Vishal Sunder, et al. ∙

research

∙ 04/11/2022

Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding

Dialog history plays an important role in spoken language understanding ...

7 Vishal Sunder, et al. ∙

research

∙ 02/26/2022

Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems

The lack of speech data annotated with labels required for spoken langua...

0 Samuel Thomas, et al. ∙

research

∙ 02/26/2022

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Compared to hybrid automatic speech recognition (ASR) systems that use a...

0 Samuel Thomas, et al. ∙

research

∙ 02/21/2022

A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets

Intent classifiers are vital to the successful operation of virtual agen...

0 Zvi Kons, et al. ∙

research

∙ 01/28/2022

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

The goal of spoken language understanding (SLU) systems is to determine ...

0 Hong-Kwang J. Kuo, et al. ∙

research

∙ 12/08/2021

Everything at Once – Multi-modal Fusion Transformer for Video Retrieval

Multi-modal learning from video data has seen increased attention recent...

0 Nina Shvetsova, et al. ∙

research

∙ 12/01/2021

Routing with Self-Attention for Multimodal Capsule Networks

The task of multimodal learning has seen a growing interest recently as ...

0 Kevin Duarte, et al. ∙

research

∙ 11/08/2021

Cascaded Multilingual Audio-Visual Learning from Videos

In this paper, we explore self-supervised audio-visual models that learn...

0 Andrew Rouditchenko, et al. ∙

research

∙ 08/18/2021

Integrating Dialog History into End-to-End Spoken Language Understanding Systems

End-to-end spoken language understanding (SLU) systems that process huma...

13 Jatin Ganhotra, et al. ∙

research

∙ 04/26/2021

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

Multimodal self-supervised learning is getting more and more attention a...

0 Brian Chen, et al. ∙

research

∙ 04/08/2021

RNN Transducer Models For Spoken Language Understanding

We present a comprehensive study on building and adapting RNN transducer...

0 Samuel Thomas, et al. ∙

research

∙ 04/07/2021

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

A major focus of recent research in spoken language understanding (SLU) ...

0 Sujeong Cha, et al. ∙

research

∙ 02/19/2021

A Compiler Infrastructure for Accelerator Generators

We present Calyx, a new intermediate language (IL) for compiling high-le...

0 Rachit Nigam, et al. ∙

research

∙ 11/16/2020

End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

Transformer networks and self-supervised pre-training have consistently ...

0 Edmilson Morais, et al. ∙

research

∙ 10/08/2020

Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

Training an end-to-end (E2E) neural network speech-to-intent (S2I) syste...

0 Yinghui Huang, et al. ∙

research

∙ 09/30/2020

End-to-End Spoken Language Understanding Without Full Transcripts

An essential component of spoken language understanding (SLU) is slot fi...

0 Hong-Kwang J. Kuo, et al. ∙

research

∙ 06/29/2020

Learning Hamiltonian Monte Carlo in R

Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian computatio...

0 Samuel Thomas, et al. ∙

research

∙ 06/16/2020

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

Current methods for learning visually grounded language from videos ofte...

14 Andrew Rouditchenko, et al. ∙

research

∙ 04/09/2020

Predictable Accelerator Design with Time-Sensitive Affine Types

Field-programmable gate arrays (FPGAs) provide an opportunity to co-desi...

0 Rachit Nigam, et al. ∙

research

∙ 04/30/2019

English Broadcast News Speech Recognition by Humans and Machines

With recent advances in deep learning, considerable attention has been g...

0 Samuel Thomas, et al. ∙

research

∙ 02/19/2019

Layering Data Structures over Skip Graphs for Increased NUMA Locality

We describe an approach for blackbox concurrency based on layering user-...

0 Samuel Thomas, et al. ∙

research

∙ 11/30/2018

Understanding Unequal Gender Classification Accuracy from Face Images

Recent work shows unequal performance of commercial face classification ...

0 Vidya Muthukumar, et al. ∙

research

∙ 11/03/2018

SimplerVoice: A Key Message & Visual Description Generator System for Illiteracy

We introduce SimplerVoice: a key message and visual description generato...

2 Minh N. B. Nguyen, et al. ∙

research

∙ 02/07/2018

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

The performance of automatic speech recognition systems degrades with in...

0 Xuesong Yang, et al. ∙

research

∙ 09/19/2017

A Recorded Debating Dataset

This paper describes an audio and textual dataset of debating speeches, ...

0 Shachar Mirkin, et al. ∙

research

∙ 03/06/2017

English Conversational Telephone Speech Recognition by Humans and Machines

One of the most difficult speech recognition tasks is accurate recogniti...

0 George Saon, et al. ∙

research

∙ 11/27/2016

Invariant Representations for Noisy Speech Recognition

Modern automatic speech recognition (ASR) systems need to be robust unde...

0 Dmitriy Serdyuk, et al. ∙

Samuel Thomas

Featured Co-authors

Sign in with Google

Consider DeepAI Pro