Wei Ji

research

∙ 09/11/2023

NExT-GPT: Any-to-Any Multimodal LLM

While recently Multimodal Large Language Models (MM-LLMs) have made exci...

0 Shengqiong Wu, et al. ∙

research

∙ 08/26/2023

Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models

Text-to-video (T2V) synthesis has gained increasing attention in the com...

0 Hao Fei, et al. ∙

research

∙ 08/19/2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval

Recent studies have shown that dense retrieval models, lacking dedicated...

0 Kaihang Pan, et al. ∙

research

∙ 08/08/2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

Multimodal Large Language Models (MLLMs) have recently sparked significa...

0 Juncheng Li, et al. ∙

research

∙ 08/08/2023

Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation

Multi-modal recommendation systems, which integrate diverse types of inf...

0 Wei Ji, et al. ∙

research

∙ 07/28/2023

Panoptic Scene Graph Generation with Semantics-prototype Learning

Panoptic Scene Graph Generation (PSG) parses objects and predicts their ...

0 Li Li, et al. ∙

research

∙ 06/08/2023

Two Heads Are Better Than One: Improving Fake News Video Detection by Correlating with Neighbors

The prevalence of short video platforms has spawned a lot of fake news v...

0 Peng Qi, et al. ∙

research

∙ 05/20/2023

Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment

Unpaired cross-lingual image captioning has long suffered from irrelevan...

0 Shengqiong Wu, et al. ∙

research

∙ 05/19/2023

Generating Visual Spatial Description via Holistic 3D Scene Understanding

Visual spatial description (VSD) aims to generate texts that describe th...

0 Yu Zhao, et al. ∙

research

∙ 05/02/2023

Transfer Visual Prompt Generator across LLMs

While developing a new vision-language LLM (VL-LLM) by pre-training on t...

0 Ao Zhang, et al. ∙

research

∙ 04/12/2023

Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications

Recently, Meta AI Research approaches a general, promptable Segment Anyt...

0 Wei Ji, et al. ∙

research

∙ 03/23/2023

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

Scene Graph Generation (SGG) aims to extract <subject, predicate, object...

0 Qifan Yu, et al. ∙

research

∙ 03/12/2023

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

Prompt tuning, a recently emerging paradigm, enables the powerful vision...

0 Juncheng Li, et al. ∙

research

∙ 02/25/2023

Scalable Attribution of Adversarial Attacks via Multi-Task Learning

Deep neural networks (DNNs) can be easily fooled by adversarial attacks ...

0 Zhongyi Guo, et al. ∙

research

∙ 12/22/2022

Multi-queue Momentum Contrast for Microvideo-Product Retrieval

The booming development and huge market of micro-videos bring new e-comm...

0 Yali Du, et al. ∙

research

∙ 12/21/2022

Dynamic Speed Guidance for CAV Ramp Merging in Non-Cooperative Environment: An On-Site Experiment

Ramp merging is a typical application of cooperative intelligent transpo...

0 Wei Ji, et al. ∙

research

∙ 11/20/2022

FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms

Short video platforms have become an important channel for news sharing,...

0 Peng Qi, et al. ∙

research

∙ 11/14/2022

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

We investigate composed image retrieval with text feedback. Users gradua...

0 Yiyang Chen, et al. ∙

research

∙ 07/21/2022

MetaComp: Learning to Adapt for Online Depth Completion

Relying on deep supervised or self-supervised learning, previous methods...

0 Yang Chen, et al. ∙

research

∙ 06/06/2022

Invariant Grounding for Video Question Answering

Video Question Answering (VideoQA) is the task of answering questions ab...

0 Yicong Li, et al. ∙

research

∙ 05/23/2022

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models

Vision-language pre-training (VLP) has shown impressive performance on a...

1 Yuan Yao, et al. ∙

research

∙ 05/15/2022

Promoting Saliency From Depth: Deep Unsupervised RGB-D Saliency Detection

Growing interests in RGB-D salient object detection (RGB-D SOD) have bee...

0 Wei Ji, et al. ∙

research

∙ 04/27/2022

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

This research aims to study a self-supervised 3D clothing reconstruction...

4 Zhedong Zheng, et al. ∙

research

∙ 03/22/2022

Fine-Grained Scene Graph Generation with Data Transfer

Scene graph generation (SGG) aims to extract (subject, predicate, object...

0 Ao Zhang, et al. ∙

research

∙ 03/02/2022

Video Question Answering: Datasets, Algorithms and Challenges

Video Question Answering (VideoQA) aims to answer natural language quest...

0 Yaoyao Zhong, et al. ∙

research

∙ 02/26/2022

Content-Variant Reference Image Quality Assessment via Knowledge Distillation

Generally, humans are more skilled at perceiving differences between hig...

0 Guanghao Yin, et al. ∙

research

∙ 12/12/2021

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

Video question answering requires the models to understand and reason ab...

0 Junbin Xiao, et al. ∙

research

∙ 12/10/2021

Rethinking the Two-Stage Framework for Grounded Situation Recognition

Grounded Situation Recognition (GSR), i.e., recognizing the salient acti...

0 Meng Wei, et al. ∙

research

∙ 11/16/2021

Meeting Summarization with Pre-training and Clustering Methods

Automatic meeting summarization is becoming increasingly popular these d...

0 Andras Huebner, et al. ∙

research

∙ 11/01/2021

PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices

The better accuracy and efficiency trade-off has been a challenging prob...

0 Guanghua Yu, et al. ∙

research

∙ 08/15/2021

Human Pose and Shape Estimation from Single Polarization Images

This paper focuses on a new problem of estimating human pose and shape f...

0 Shihao Zou, et al. ∙

research

∙ 06/03/2021

Deconfounded Video Moment Retrieval with Causal Intervention

We tackle the task of video moment retrieval (VMR), which aims to locali...

0 Xun Yang, et al. ∙

research

∙ 05/26/2021

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Weakly-Supervised Object Detection (WSOD) and Localization (WSOL), i.e.,...

0 Feifei Shao, et al. ∙

research

∙ 03/15/2021

Boundary Proposal Network for Two-Stage Natural Language Video Localization

We aim to address the problem of Natural Language Video Localization (NL...

0 Shaoning Xiao, et al. ∙

research

∙ 07/23/2020

Accurate RGB-D Salient Object Detection via Collaborative Learning

Benefiting from the spatial cues embedded in depth images, recent progre...

0 Wei Ji, et al. ∙

research

∙ 04/30/2020

An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

The COVID-19 outbreak was announced as a global pandemic by the World He...

0 Jing Han, et al. ∙

research

∙ 10/06/2018

Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images

As a fundamental and challenging problem in computer vision, hand pose e...

2 Yiming Wu, et al. ∙

Wei Ji

Featured Co-authors

Sign in with Google

Consider DeepAI Pro