Xin Eric Wang

research

∙ 05/29/2023

Photoswap: Personalized Subject Swapping in Images

In an era where images and visual content dominate our digital landscape...

0 Jing Gu, et al. ∙

research

∙ 05/24/2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Attaining a high degree of user controllability in visual generation oft...

6 Weixi Feng, et al. ∙

research

∙ 05/23/2023

R2H: Building Multimodal Navigation Helpers that Respond to Help

The ability to assist humans during a navigation task in a supportive ro...

0 Yue Fan, et al. ∙

research

∙ 05/18/2023

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

The field of text-to-image (T2I) generation has garnered significant att...

0 Wanrong Zhu, et al. ∙

research

∙ 05/18/2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Existing automatic evaluation on text-to-image synthesis can only provid...

0 Yujie Lu, et al. ∙

research

∙ 05/18/2023

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion models, such as Stable Diffusion, have shown incredible perfor...

0 Xuehai He, et al. ∙

research

∙ 05/02/2023

Multimodal Procedural Planning via Dual Text-Image Prompting

Embodied agents have achieved prominent performance in following human i...

5 Yujie Lu, et al. ∙

research

∙ 05/02/2023

Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment

Pre-trained vision and language models such as CLIP have witnessed remar...

0 Zhen Zhang, et al. ∙

research

∙ 04/30/2023

Multimodal Graph Transformer for Multimodal Question Answering

Despite the success of Transformer models in vision and language tasks, ...

2 Xuehai He, et al. ∙

research

∙ 01/30/2023

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

The ability to accurately locate and navigate to a specific object is a ...

0 Kaiwen Zhou, et al. ∙

research

∙ 12/09/2022

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

Large-scale diffusion models have achieved state-of-the-art results on t...

0 Weixi Feng, et al. ∙

research

∙ 11/27/2022

Navigation as the Attacker Wishes? Towards Building Byzantine-Robust Embodied Agents under Federated Learning

Federated embodied agent learning protects the data privacy of individua...

0 Yunchao Zhang, et al. ∙

research

∙ 11/25/2022

ComCLIP: Training-Free Compositional Image and Text Matching

Contrastive Language-Image Pretraining (CLIP) has demonstrated great zer...

0 Kenan Jiang, et al. ∙

research

∙ 10/19/2022

CPL: Counterfactual Prompt Learning for Vision and Language Models

Prompt tuning is a new few-shot transfer learning technique that only tu...

0 Xuehai He, et al. ∙

research

∙ 10/07/2022

Visualize Before You Write: Imagination-Guided Open-Ended Text Generation

Recent advances in text-to-image synthesis make it possible to visualize...

4 Wanrong Zhu, et al. ∙

research

∙ 09/10/2022

Anticipating the Unseen Discrepancy for Vision and Language Navigation

Vision-Language Navigation requires the agent to follow natural language...

0 Yujie Lu, et al. ∙

research

∙ 08/28/2022

JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents

Building a conversational embodied agent to execute real-life tasks has ...

50 Kaizhi Zheng, et al. ∙

research

∙ 06/17/2022

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation

Benefiting from language flexibility and compositionality, humans natura...

0 Kaizhi Zheng, et al. ∙

research

∙ 06/06/2022

Neuro-Symbolic Causal Language Planning with Commonsense Prompting

Language planning aims to implement complex high-level goals by decompos...

0 Yujie Lu, et al. ∙

research

∙ 05/24/2022

Aerial Vision-and-Dialog Navigation

The ability to converse with humans and follow commands in natural langu...

0 Yue Fan, et al. ∙

research

∙ 04/18/2022

Imagination-Augmented Natural Language Understanding

Human brains integrate linguistic and perceptual information simultaneou...

4 Yujie Lu, et al. ∙

research

∙ 03/29/2022

Parameter-efficient Fine-tuning for Vision Transformers

In computer vision, it has achieved great success in adapting large-scal...

2 Xuehai He, et al. ∙

research

∙ 03/28/2022

FedVLN: Privacy-preserving Federated Vision-and-Language Navigation

Data privacy is a central problem for embodied agents that can perceive ...

0 Kaiwen Zhou, et al. ∙

research

∙ 03/24/2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

Temporal grounding in videos aims to localize one target video segment t...

6 Juncheng Li, et al. ∙

research

∙ 03/22/2022

Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

A long-term goal of AI research is to build intelligent agents that can ...

0 Jing Gu, et al. ∙

research

∙ 12/02/2021

Relational Graph Learning for Grounded Video Description Generation

Grounded video description (GVD) encourages captioning models to attend ...

0 Wenqiao Zhang, et al. ∙

research

∙ 06/21/2021

CUDA-GR: Controllable Unsupervised Domain Adaptation for Gaze Redirection

The aim of gaze redirection is to manipulate the gaze in an image to the...

0 Swati Jindal, et al. ∙

research

∙ 06/10/2021

ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

Automatic evaluations for natural language generation (NLG) conventional...

16 Wanrong Zhu, et al. ∙

research

∙ 06/08/2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation

Most existing video-and-language (VidL) research focuses on a single dat...

3 Linjie Li, et al. ∙

research

∙ 06/01/2021

Language-Driven Image Style Transfer

Despite having promising results, style transfer, which requires prepari...

0 Tsu-Jui Fu, et al. ∙

research

∙ 04/02/2021

Language-based Video Editing via Multi-Modal Multi-Level Transformer

Video editing tools are widely used nowadays for digital design. Althoug...

0 Tsu-Jui Fu, et al. ∙

research

∙ 03/30/2021

Diagnosing Vision-and-Language Navigation: What Really Matters

Vision-and-language navigation (VLN) is a multimodal task where an agent...

10 Wanrong Zhu, et al. ∙

research

∙ 02/03/2021

L2C: Describing Visual Differences Needs Semantic Understanding of Individuals

Recent advances in language and vision push forward the research of capt...

0 An Yan, et al. ∙

research

∙ 10/07/2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

A major challenge in visually grounded language generation is to build r...

0 Wanrong Zhu, et al. ∙

research

∙ 09/28/2020

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

Vision-and-Language Navigation (VLN) is a natural language grounding tas...

0 Jiannan Xiang, et al. ∙

research

∙ 09/21/2020

SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning

Iterative Language-Based Image Editing (IL-BIE) tasks follow iterative i...

0 Tsu-Jui Fu, et al. ∙

Xin Eric Wang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro