Andrew Rouditchenko

research

∙ 05/21/2023

Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages

Recent models such as XLS-R and Whisper have made multilingual speech te...

0 Andrew Rouditchenko, et al. ∙

research

∙ 03/29/2023

What, when, and where? – Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Spatio-temporal grounding describes the task of localizing events in spa...

0 Brian Chen, et al. ∙

research

∙ 10/07/2022

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Multilingual text-video retrieval methods have improved significantly in...

0 Andrew Rouditchenko, et al. ∙

research

∙ 12/08/2021

Everything at Once – Multi-modal Fusion Transformer for Video Retrieval

Multi-modal learning from video data has seen increased attention recent...

0 Nina Shvetsova, et al. ∙

research

∙ 12/01/2021

Routing with Self-Attention for Multimodal Capsule Networks

The task of multimodal learning has seen a growing interest recently as ...

0 Kevin Duarte, et al. ∙

research

∙ 11/08/2021

Cascaded Multilingual Audio-Visual Learning from Videos

In this paper, we explore self-supervised audio-visual models that learn...

0 Andrew Rouditchenko, et al. ∙

research

∙ 10/14/2021

Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Visually-grounded spoken language datasets can enable models to learn cr...

12 Ian Palmer, et al. ∙

research

∙ 06/10/2021

Cross-Modal Discrete Representation Learning

Recent advances in representation learning have demonstrated an ability ...

0 Alexander H. Liu, et al. ∙

research

∙ 04/26/2021

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

Multimodal self-supervised learning is getting more and more attention a...

0 Brian Chen, et al. ∙

research

∙ 06/16/2020

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

Current methods for learning visually grounded language from videos ofte...

14 Andrew Rouditchenko, et al. ∙

research

∙ 10/19/2019

Label-efficient audio classification through multitask learning and self-supervision

While deep learning has been incredibly successful in modeling tasks wit...

0 Tyler Lee, et al. ∙

research

∙ 04/18/2019

Self-Supervised Audio-Visual Co-Segmentation

Segmenting objects in images and separating sound sources in audio are c...

0 Andrew Rouditchenko, et al. ∙

research

∙ 04/09/2018

The Sound of Pixels

We introduce PixelPlayer, a system that, by leveraging large amounts of ...

0 Hang Zhao, et al. ∙

Andrew Rouditchenko

Featured Co-authors

Sign in with Google

Consider DeepAI Pro