Jaemin Cho

research

∙ 05/24/2023

Visual Programming for Text-to-Image Generation and Evaluation

As large language models have demonstrated impressive performance in man...

8 Jaemin Cho, et al. ∙

research

∙ 05/18/2023

Paxion: Patching Action Knowledge in Video-Language Foundation Models

Action knowledge involves the understanding of textual, visual, and temp...

3 Zhenhailong Wang, et al. ∙

research

∙ 05/11/2023

Self-Chained Image-Language Model for Video Localization and Question Answering

Recent studies have shown promising results on utilizing pre-trained ima...

3 Shoubin Yu, et al. ∙

research

∙ 04/13/2023

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

Spatial control is a core capability in controllable image generation. A...

4 Jaemin Cho, et al. ∙

research

∙ 03/29/2023

Hierarchical Video-Moment Retrieval and Step-Captioning

There is growing interest in searching for information from large video ...

6 Abhay Zala, et al. ∙

research

∙ 11/21/2022

Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention

We present Perceiver-VL, a vision-and-language framework that efficientl...

6 Zineng Tang, et al. ∙

research

∙ 09/28/2022

TVLT: Textless Vision-Language Transformer

In this work, we present the Textless Vision-Language Transformer (TVLT)...

4 Zineng Tang, et al. ∙

research

∙ 06/13/2022

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

Fine-tuning large pre-trained models on downstream tasks has been adopte...

5 Yi-Lin Sung, et al. ∙

research

∙ 05/26/2022

Fine-grained Image Captioning with CLIP Reward

Modern image captioning models are usually trained with text similarity ...

8 Jaemin Cho, et al. ∙

research

∙ 02/08/2022

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers

Generating images from textual descriptions has gained a lot of attentio...

10 Jaemin Cho, et al. ∙

research

∙ 12/20/2021

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

Recently, there has been an increasing interest in building question ans...

14 Revanth Gangi Reddy, et al. ∙

research

∙ 12/13/2021

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

Recently, fine-tuning language models pre-trained on large text corpora ...

2 Yi-Lin Sung, et al. ∙

research

∙ 07/06/2021

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Since visual perception can give rich information beyond text descriptio...

3 Zineng Tang, et al. ∙

research

∙ 02/04/2021

Unifying Vision-and-Language Tasks via Text Generation

Existing methods for vision-and-language learning typically require desi...

38 Jaemin Cho, et al. ∙

research

∙ 09/23/2020

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

Mirroring the success of masked language models, vision-and-language cou...

4 Jaemin Cho, et al. ∙

research

∙ 09/04/2019

Mixture Content Selection for Diverse Sequence Generation

Generating diverse sequences is important in many NLP applications such ...

0 Jaemin Cho, et al. ∙

research

∙ 04/10/2018

A Hierarchical Latent Structure for Variational Conversation Modeling

Variational autoencoders (VAE) combined with hierarchical RNNs have emer...

0 Yookoon Park, et al. ∙

Jaemin Cho

Featured Co-authors

Sign in with Google

Consider DeepAI Pro