Anwen Hu

research

∙ 08/21/2023

Explore and Tell: Embodied Visual Captioning in 3D Environments

While current visual captioning models have achieved impressive performa...

0 Anwen Hu, et al. ∙

research

∙ 07/04/2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Document understanding refers to automatically extract, analyze and comp...

0 Jiabo Ye, et al. ∙

research

∙ 06/23/2023

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Image captioning aims to describe visual content in natural language. As...

0 Zihao Yue, et al. ∙

research

∙ 06/07/2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

To promote the development of Vision-Language Pre-training (VLP) and mul...

0 Haiyang Xu, et al. ∙

research

∙ 05/20/2023

Movie101: A New Movie Understanding Benchmark

To help the visually impaired enjoy movies, automatic movie narrating sy...

0 Zihao Yue, et al. ∙

research

∙ 05/10/2023

InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation

Automatic image captioning evaluation is critical for benchmarking and p...

0 Anwen Hu, et al. ∙

research

∙ 04/27/2023

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Large language models (LLMs) have demonstrated impressive zero-shot abil...

0 Qinghao Ye, et al. ∙

research

∙ 03/12/2023

Accommodating Audio Modality in CLIP for Multimodal Processing

Multimodal processing has attracted much attention lately especially wit...

0 Ludan Ruan, et al. ∙

research

∙ 08/04/2021

Question-controlled Text-aware Image Captioning

For an image with multiple scene texts, different people may be interest...

0 Anwen Hu, et al. ∙

research

∙ 08/04/2021

ICECAP: Information Concentrated Entity-aware Image Captioning

Most current image captioning systems focus on describing general image ...

0 Anwen Hu, et al. ∙

Anwen Hu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro