Longteng Guo

research

∙ 08/23/2023

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

Building scalable vision-language models to learn from diverse, multimod...

0 Junyi Chen, et al. ∙

research

∙ 05/25/2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

Building general-purpose models that can perceive diverse real-world mod...

0 Zijia Zhao, et al. ∙

research

∙ 05/19/2023

Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner

Large pre-trained multimodal models have demonstrated significant succes...

0 Zikang Liu, et al. ∙

research

∙ 04/17/2023

VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

In this paper, we propose a Vision-Audio-Language Omni-peRception pretra...

0 Sihan Chen, et al. ∙

research

∙ 10/09/2022

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning

Multimodal representation learning has shown promising improvements on v...

0 Zijia Zhao, et al. ∙

research

∙ 07/01/2021

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross...

0 Jing Liu, et al. ∙

research

∙ 01/26/2021

CPTR: Full Transformer Network for Image Captioning

In this paper, we consider the image captioning task from a new sequence...

0 Wei Liu, et al. ∙

research

∙ 01/24/2021

Fast Sequence Generation with Multi-Agent Reinforcement Learning

Autoregressive sequence Generation models have achieved state-of-the-art...

0 Longteng Guo, et al. ∙

research

∙ 12/16/2020

AutoCaption: Image Captioning with Neural Architecture Search

Image captioning transforms complex visual information into abstract nat...

0 Xinxin Zhu, et al. ∙

research

∙ 05/10/2020

Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning

Most image captioning models are autoregressive, i.e. they generate each...

0 Longteng Guo, et al. ∙

research

∙ 03/19/2020

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Self-attention (SA) network has shown profound value in image captioning...

0 Longteng Guo, et al. ∙

research

∙ 10/17/2019

Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019

This document describes our solution for the VATEX Captioning Challenge ...

0 Xinxin Zhu, et al. ∙

research

∙ 08/06/2019

Aligning Linguistic Words and Visual Semantic Units for Image Captioning

Image captioning attempts to generate a sentence composed of several lin...

6 Longteng Guo, et al. ∙

Longteng Guo

Featured Co-authors

Sign in with Google

Consider DeepAI Pro