Qinghao Ye

research

∙ 08/29/2023

Evaluation and Analysis of Hallucination in Large Vision-Language Models

Large Vision-Language Models (LVLMs) have recently achieved remarkable s...

0 Junyang Wang, et al. ∙

research

∙ 08/07/2023

COPA: Efficient Vision-Language Pre-training Through Collaborative Object- and Patch-Text Alignment

Vision-Language Pre-training (VLP) methods based on object detection enj...

0 Chaoya Jiang, et al. ∙

research

∙ 07/17/2023

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...

0 Chaoya Jiang, et al. ∙

research

∙ 07/04/2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Document understanding refers to automatically extract, analyze and comp...

0 Jiabo Ye, et al. ∙

research

∙ 06/07/2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

To promote the development of Vision-Language Pre-training (VLP) and mul...

0 Haiyang Xu, et al. ∙

research

∙ 05/03/2023

Transforming Visual Scene Graphs to Image Captions

We propose to Transform Scene Graphs (TSG) into more descriptive caption...

0 Xu Yang, et al. ∙

research

∙ 04/27/2023

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Large language models (LLMs) have demonstrated impressive zero-shot abil...

0 Qinghao Ye, et al. ∙

research

∙ 04/16/2023

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

In this paper, we present ChatPLUG, a Chinese open-domain dialogue syste...

0 Junfeng Tian, et al. ∙

research

∙ 02/01/2023

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Recent years have witnessed a big convergence of language, vision, and m...

0 Haiyang Xu, et al. ∙

research

∙ 01/05/2023

Learning Trajectory-Word Alignments for Video-Language Tasks

Aligning objects with words plays a critical role in Image-Language BERT...

0 Xu Yang, et al. ∙

research

∙ 12/30/2022

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

Video-language pre-training has advanced the performance of various down...

0 Qinghao Ye, et al. ∙

research

∙ 05/06/2022

All Grains, One Scheme (AGOS): Learning Multi-grain Instance Representation for Aerial Scene Classification

Aerial scene classification remains challenging as: 1) the size of key o...

0 Qi Bi, et al. ∙

research

∙ 01/31/2022

AI-based Medical e-Diagnosis for Fast and Automatic Ventricular Volume Measurement in the Patients with Normal Pressure Hydrocephalus

Based on CT and MRI images acquired from normal pressure hydrocephalus (...

0 Xi Zhou, et al. ∙

research

∙ 01/27/2022

Exploring Global Diversity and Local Context for Video Summarization

Video summarization aims to automatically generate a diverse and concise...

9 Yingchao Pan, et al. ∙

research

∙ 12/09/2021

Robust Weakly Supervised Learning for COVID-19 Recognition Using Multi-Center CT Images

The world is currently experiencing an ongoing pandemic of an infectious...

5 Qinghao Ye, et al. ∙

research

∙ 02/03/2021

Unbox the Black-box for the Medical Explainable AI via Multi-modal and Multi-centre Data Fusion: A Mini-Review, Two Showcases and Beyond

Explainable Artificial Intelligence (XAI) is an emerging research topic ...

19 Guang Yang, et al. ∙

research

∙ 09/23/2020

Exploring global diverse attention via pairwise temporal relation for video summarization

Video summarization is an effective way to facilitate video searching an...

6 Ping Li, et al. ∙

research

∙ 11/28/2019

Application of Time Series Analysis to Traffic Accidents in Los Angeles

With the improvements of Los Angeles in many aspects, people in mounting...

0 Qinghao Ye, et al. ∙

Qinghao Ye

Featured Co-authors

Sign in with Google

Consider DeepAI Pro