b'Yale Song'

research

∙ 07/11/2023

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

Video-language pre-training (VLP) has become increasingly important due ...

0 Shraman Pramanick, et al. ∙

research

∙ 02/03/2023

Egocentric Video Task Translation @ Ego4D Challenge 2022

This technical report describes the EgoTask Translation approach that ex...

0 Zihui Xue, et al. ∙

research

∙ 12/13/2022

Egocentric Video Task Translation

Different video understanding tasks are typically treated in isolation, ...

0 Zihui Xue, et al. ∙

research

∙ 10/21/2022

Video Summarization Overview

With the broad growth of video capturing devices and applications on the...

0 Mayu Otani, et al. ∙

research

∙ 07/22/2022

Neural-Sim: Learning to Generate Training Data with NeRF

Training computer vision models usually requires collecting and labeling...

5 Yunhao Ge, et al. ∙

research

∙ 07/11/2022

Scaling Novel Object Detection with Weakly Supervised Detection Transformers

Weakly supervised object detection (WSOD) enables object detectors to be...

6 Tyler LaBonte, et al. ∙

research

∙ 04/23/2022

Visual Attention Emerges from Recurrent Sparse Reconstruction

Visual attention helps achieve robust perception under noise, corruption...

6 Baifeng Shi, et al. ∙

research

∙ 01/12/2022

Robust Contrastive Learning against Noisy Views

Contrastive learning relies on an assumption that positive pairs contain...

12 Ching-Yao Chuang, et al. ∙

research

∙ 04/07/2021

Contrastive Learning of Global and Local Audio-Visual Representations

Contrastive learning has delivered impressive results in many audio-visu...

0 Shuang Ma, et al. ∙

research

∙ 01/28/2021

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Creating presentation materials requires complex multimodal reasoning sk...

0 Tsu-Jui Fu, et al. ∙

research

∙ 01/26/2021

Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning

Large-scale datasets are the cornerstone of self-supervised representati...

8 Sangho Lee, et al. ∙

research

∙ 12/08/2020

Parameter Efficient Multimodal Transformers for Video Representation Learning

The recent success of Transformers in the language domain has motivated ...

0 Sangho Lee, et al. ∙

research

∙ 12/03/2020

Learning to Transfer Visual Effects from Videos to Images

We study the problem of animating images by transferring spatio-temporal...

0 Christopher Thomas, et al. ∙

research

∙ 10/25/2019

Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

Current multi-reference style transfer models for Text-to-Speech (TTS) p...

0 Matt Whitehill, et al. ∙

research

∙ 08/19/2019

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck

Deep generative models have led to significant advances in cross-modal g...

8 Shuang Ma, et al. ∙

research

∙ 08/05/2019

Image to Video Domain Adaptation Using Web Supervision

Training deep neural networks typically requires large amounts of labele...

0 Andrew Kae, et al. ∙

research

∙ 07/09/2019

M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

Generative adversarial networks have led to significant advances in cros...

0 Shuang Ma, et al. ∙

research

∙ 06/11/2019

Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval

Visual-semantic embedding aims to find a shared latent space where relat...

5 Yale Song, et al. ∙

research

∙ 07/07/2018

Video Prediction with Appearance and Motion Conditions

Video prediction aims to generate realistic future frames by learning dy...

0 Yunseok Jang, et al. ∙

research

∙ 04/12/2018

Cross-Modal Retrieval with Implicit Concept Association

Traditional cross-modal retrieval assumes explicit association of concep...

0 Yale Song, et al. ∙

research

∙ 01/27/2018

Image2GIF: Generating Cinemagraphs using Recurrent Deep Q-Networks

Given a still photograph, one can imagine how dynamic objects might move...

0 Yipin Zhou, et al. ∙

research

∙ 08/23/2017

ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets

Video consumption is being shifted from sit-and-watch to selective skimm...

0 Haojian Jin, et al. ∙

research

∙ 04/11/2017

Improving Pairwise Ranking for Multi-label Image Classification

Learning to rank has recently emerged as an attractive technique to trai...

0 Yuncheng Li, et al. ∙

research

∙ 03/07/2017

Learning from Noisy Labels with Distillation

The ability of learning from noisy labels is very useful in many visual ...

0 Yuncheng Li, et al. ∙

research

∙ 11/27/2016

Real-Time Video Highlights for Yahoo Esports

Esports has gained global popularity in recent years and several compani...

0 Yale Song, et al. ∙

research

∙ 04/25/2016

Balancing Appearance and Context in Sketch Interpretation

We describe a sketch interpretation system that detects and classifies c...

0 Yale Song, et al. ∙

research

∙ 04/10/2016

TGIF: A New Dataset and Benchmark on Animated GIF Description

With the recent popularity of animated GIFs on social media, there is ne...

0 Yuncheng Li, et al. ∙

Yale Song

Featured Co-authors

Sign in with Google

Consider DeepAI Pro