False negatives (FN) in 3D object detection, e.g., missing predictions
o...
View Transformation Module (VTM), where transformations happen between
m...
This technical report summarizes the winning solution for the 3D Occupan...
In recent years, differential privacy has seen significant advancements ...
We present a one-shot method to infer and render a photorealistic 3D
rep...
Recent vision-language models have shown impressive multi-modal generati...
Humans can easily imagine the complete 3D geometry of occluded objects a...
Augmenting pretrained language models (LMs) with a vision encoder (e.g.,...
We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mas...
This report describes the winning solution to the semantic segmentation ...
Pre-trained vision-language models (e.g., CLIP) have shown promising
zer...
3D Point cloud is becoming a critical data representation in many real-w...
We propose MinVIS, a minimal video instance segmentation (VIS) framework...
Given a small training data set and a learning algorithm, how much more ...
A significant gap remains between today's visual pattern recognition mod...
Recent studies show that Vision Transformers(ViTs) exhibit strong robust...
Reasoning about visual relationships is central to how humans interpret ...
In this paper, we propose M^2BEV, a unified framework that jointly perfo...
Instance segmentation is a fundamental vision task that aims to recogniz...
Data augmentation is a simple yet effective way to improve the robustnes...
We present Panoptic SegFormer, a general framework for end-to-end panopt...
Deep neural networks have reached very high accuracy on object detection...
Generalization has been a long-standing challenge for reinforcement lear...
The open-world deployment of Machine Learning (ML) algorithms in
safety-...
We present SegFormer, a simple, efficient yet powerful semantic segmenta...
We introduce DiscoBox, a novel framework that jointly learns instance
se...
Training on datasets with long-tailed distributions has been challenging...
Training on synthetic data can be beneficial for label or data-scarce
sc...
Existing work on object detection often relies on a single form of
annot...
We propose a distributionally robust learning (DRL) method for unsupervi...
Humans have an inherent ability to learn novel concepts from only a few
...
Aliasing refers to the phenomenon that high frequency signals degenerate...
Although a significant progress has been witnessed in supervised person
...
Recent generative adversarial networks (GANs) are able to generate impre...
Neural networks are vulnerable to input perturbations such as additive n...
Conventional CNNs for texture synthesis consist of a sequence of
(de)-co...
Models trained on synthetic images often face degraded generalization to...
Although having achieved great success in medical image segmentation, de...
Weakly supervised learning has emerged as a compelling tool for object
d...
Although convolutional neural networks (CNNs) are inspired by the mechan...
Recent advances in domain adaptation show that deep self-training presen...
Recent work on minimum hyperspherical energy (MHE) has demonstrated its
...
Person re-identification (re-id) remains challenging due to significant
...
In this paper, we present a simple yet effective padding scheme that can...
Recent deep networks achieved state of the art performance on a variety ...
Edge detection is among the most fundamental vision problems for its rol...
Neural networks are a powerful class of nonlinear functions that can be
...
Inner product-based convolution has been a central component of convolut...
A family of super deep networks, referred to as residual networks or Res...
Convolution as inner product has been the founding basis of convolutiona...