View Transformation Module (VTM), where transformations happen between
m...
We present the All-Seeing (AS) project: a large-scale data and model for...
The combination of audio and vision has long been a topic of interest in...
The evolution of semantic segmentation has long been dominated by learni...
Large language models (LLMs) have notably accelerated progress towards
a...
Automatically generating human-readable text describing the functionalit...
In this study, we initiate an exploration into video understanding by
in...
We present an interactive visual framework named InternGPT, or iGPT for
...
In this report, we present our champion solution to the WSDM2023 Toloka
...
Modern autonomous driving system is characterized as modular tasks in
se...
Video recognition in an open and dynamic world is quite challenging, as ...
Despite the remarkable success of foundation models, their task-specific...
Recent success of vision transformers has inspired a series of vision
ba...
Compared to the great progress of large-scale vision transformers (ViTs)...
StarCraft II (SC2) poses a grand challenge for reinforcement learning (R...
Incremental few-shot semantic segmentation (IFSS) targets at incremental...
To build an artificial neural network like the biological intelligence
s...
This work investigates a simple yet powerful adapter for Vision Transfor...
Industrial control systems (ICSs) are facing increasing cyber-physical
a...
3D visual perception tasks, including 3D detection and map segmentation ...
Although convolutional neural networks (CNNs) have achieved remarkable
p...
Deep learning-based models encounter challenges when processing long-tai...
We propose an accurate and efficient scene text detection framework, ter...
Recent approaches for end-to-end text spotting have achieved promising
r...
We present Panoptic SegFormer, a general framework for end-to-end panopt...
A single unit (head) is the conventional input feature extractor in deep...
Few-shot learning aims to recognize new categories using very few labele...
Most polyp segmentation methods use CNNs as their backbone, leading to t...
We present SegFormer, a simple, efficient yet powerful semantic segmenta...
Reducing the complexity of the pipeline of instance segmentation is cruc...
StarCraft II (SC2) is a real-time strategy game, in which players produc...
We present an extremely simple Ultra-Resolution Style Transfer framework...
Unsupervised contrastive learning achieves great success in learning ima...
This work presents a new fine-grained transparent object segmentation
da...
Although a polygon is a more accurate representation than an upright bou...
Scene text spotting aims to detect and recognize the entire word or sent...
Multi-person pose estimation is challenging because it localizes body
ke...
Low-resolution text images are often seen in natural scenes such as docu...
Transparent objects such as windows and bottles made by glass widely exi...
In this paper, we introduce an anchor-box free and single shot instance
...
Scene text recognition has witnessed rapid development with the advance ...
Scene text detection, an important step of scene text reading systems, h...
In standard Convolutional Neural Networks (CNNs), the receptive fields o...
The challenges of shape robust text detection lie in two aspects: 1) mos...
Basing on the analysis by revealing the equivalence of modern networks, ...