Wenhai Wang

research

∙ 08/04/2023

FB-BEV: BEV Representation from Forward-Backward View Transformations

View Transformation Module (VTM), where transformations happen between m...

0 Zhiqi Li, et al. ∙

research

∙ 08/03/2023

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

We present the All-Seeing (AS) project: a large-scale data and model for...

0 Weiyun Wang, et al. ∙

research

∙ 07/03/2023

AVSegFormer: Audio-Visual Segmentation with Transformer

The combination of audio and vision has long been a topic of interest in...

0 Shengyi Gao, et al. ∙

research

∙ 06/02/2023

Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

The evolution of semantic segmentation has long been dominated by learni...

0 Zeqiang Lai, et al. ∙

research

∙ 05/18/2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Large language models (LLMs) have notably accelerated progress towards a...

0 Wenhai Wang, et al. ∙

research

∙ 05/18/2023

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

Automatically generating human-readable text describing the functionalit...

0 Tong Ye, et al. ∙

research

∙ 05/10/2023

VideoChat: Chat-Centric Video Understanding

In this study, we initiate an exploration into video understanding by in...

0 Kunchang Li, et al. ∙

research

∙ 05/09/2023

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

We present an interactive visual framework named InternGPT, or iGPT for ...

0 Zhaoyang Liu, et al. ∙

research

∙ 01/22/2023

Champion Solution for the WSDM2023 Toloka VQA Challenge

In this report, we present our champion solution to the WSDM2023 Toloka ...

0 Shengyi Gao, et al. ∙

research

∙ 12/20/2022

Goal-oriented Autonomous Driving

Modern autonomous driving system is characterized as modular tasks in se...

0 Yihan Hu, et al. ∙

research

∙ 12/03/2022

VLG: General Video Recognition with Web Textual Knowledge

Video recognition in an open and dynamic world is quite challenging, as ...

0 Jintao Lin, et al. ∙

research

∙ 11/17/2022

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Despite the remarkable success of foundation models, their task-specific...

30 Hao Li, et al. ∙

research

∙ 11/10/2022

Demystify Transformers Convolutions in Modern Image Deep Networks

Recent success of vision transformers has inspired a series of vision ba...

0 Jifeng Dai, et al. ∙

research

∙ 11/10/2022

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Compared to the great progress of large-scale vision transformers (ViTs)...

0 Wenhai Wang, et al. ∙

research

∙ 09/23/2022

On Efficient Reinforcement Learning for Full-length Game of StarCraft II

StarCraft II (SC2) poses a grand challenge for reinforcement learning (R...

12 Ruo-Ze Liu, et al. ∙

research

∙ 07/26/2022

Incremental Few-Shot Semantic Segmentation via Embedding Adaptive-Update and Hyper-class Representation

Incremental few-shot semantic segmentation (IFSS) targets at incremental...

7 Guangchen Shi, et al. ∙

research

∙ 06/09/2022

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

To build an artificial neural network like the biological intelligence s...

0 Jinguo Zhu, et al. ∙

research

∙ 05/17/2022

Vision Transformer Adapter for Dense Predictions

This work investigates a simple yet powerful adapter for Vision Transfor...

8 Zhe Chen, et al. ∙

research

∙ 04/21/2022

Hybrid Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks

Industrial control systems (ICSs) are facing increasing cyber-physical a...

2 Tao Yang, et al. ∙

research

∙ 03/31/2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3D visual perception tasks, including 3D detection and map segmentation ...

0 Zhiqi Li, et al. ∙

research

∙ 03/16/2022

WegFormer: Transformers for Weakly Supervised Semantic Segmentation

Although convolutional neural networks (CNNs) have achieved remarkable p...

0 Chunmeng Liu, et al. ∙

research

∙ 11/26/2021

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Deep learning-based models encounter challenges when processing long-tai...

0 Changyao Tian, et al. ∙

research

∙ 11/03/2021

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

We propose an accurate and efficient scene text detection framework, ter...

5 Zhe Chen, et al. ∙

research

∙ 10/20/2021

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

Recent approaches for end-to-end text spotting have achieved promising r...

0 Humen Zhong, et al. ∙

research

∙ 09/08/2021

Panoptic SegFormer

We present Panoptic SegFormer, a general framework for end-to-end panopt...

8 Zhiqi Li, et al. ∙

research

∙ 09/04/2021

An empirical evaluation of attention-based multi-head models for improved turbofan engine remaining useful life prediction

A single unit (head) is the conventional input feature extractor in deep...

12 Abiodun Ayodeji, et al. ∙

research

∙ 08/25/2021

Learning Class-level Prototypes for Few-shot Learning

Few-shot learning aims to recognize new categories using very few labele...

7 Minglei Yuan, et al. ∙

research

∙ 08/16/2021

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Most polyp segmentation methods use CNNs as their backbone, leading to t...

11 Bo Dong, et al. ∙

research

∙ 05/31/2021

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

We present SegFormer, a simple, efficient yet powerful semantic segmenta...

6 Enze Xie, et al. ∙

research

∙ 05/05/2021

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

Reducing the complexity of the pipeline of instance segmentation is cruc...

0 Enze Xie, et al. ∙

research

∙ 04/14/2021

An Introduction of mini-AlphaStar

StarCraft II (SC2) is a real-time strategy game, in which players produc...

0 Ruo-Ze Liu, et al. ∙

research

∙ 03/22/2021

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

We present an extremely simple Ultra-Resolution Style Transfer framework...

7 Zhe Chen, et al. ∙

research

∙ 02/09/2021

DetCo: Unsupervised Contrastive Learning for Object Detection

Unsupervised contrastive learning achieves great success in learning ima...

16 Enze Xie, et al. ∙

research

∙ 01/21/2021

Trans2Seg: Transparent Object Segmentation with Transformer

This work presents a new fine-grained transparent object segmentation da...

11 Enze Xie, et al. ∙

research

∙ 11/26/2020

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training

Although a polygon is a more accurate representation than an upright bou...

0 Weijia Wu, et al. ∙

research

∙ 08/03/2020

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

Scene text spotting aims to detect and recognize the entire word or sent...

0 Wenhai Wang, et al. ∙

research

∙ 07/23/2020

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

Multi-person pose estimation is challenging because it localizes body ke...

0 Sheng Jin, et al. ∙

research

∙ 05/07/2020

Scene Text Image Super-Resolution in the Wild

Low-resolution text images are often seen in natural scenes such as docu...

0 Wenjia Wang, et al. ∙

research

∙ 03/31/2020

Segmenting Transparent Objects in the Wild

Transparent objects such as windows and bottles made by glass widely exi...

0 Enze Xie, et al. ∙

research

∙ 09/29/2019

PolarMask: Single Shot Instance Segmentation with Polar Representation

In this paper, we introduce an anchor-box free and single shot instance ...

23 Enze Xie, et al. ∙

research

∙ 09/16/2019

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

Scene text recognition has witnessed rapid development with the advance ...

3 Wenjia Wang, et al. ∙

research

∙ 08/16/2019

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Scene text detection, an important step of scene text reading systems, h...

2 Wenhai Wang, et al. ∙

research

∙ 03/15/2019

Selective Kernel Networks

In standard Convolutional Neural Networks (CNNs), the receptive fields o...

0 Xiang Li, et al. ∙

research

∙ 06/07/2018

Shape Robust Text Detection with Progressive Scale Expansion Network

The challenges of shape robust text detection lie in two aspects: 1) mos...

0 Xiang Li, et al. ∙

research

∙ 02/06/2018

Mixed Link Networks

Basing on the analysis by revealing the equivalence of modern networks, ...

0 Wenhai Wang, et al. ∙

Wenhai Wang

Featured Co-authors

Sign in with Google

Consider DeepAI Pro