Xinlei Chen

research

∙ 06/15/2023

Path Generation for Wheeled Robots Autonomous Navigation on Vegetated Terrain

Wheeled robot navigation has been widely used in urban environments, but...

0 Zhuozhu Jian, et al. ∙

research

∙ 06/14/2023

Improving Selective Visual Question Answering by Learning from Your Peers

Despite advances in Visual Question Answering (VQA), the ability of mode...

0 Corentin Dancette, et al. ∙

research

∙ 06/08/2023

R-MAE: Regions Meet Masked Autoencoders

Vision-specific concepts such as "region" have played a key role in exte...

2 Duy-Kien Nguyen, et al. ∙

research

∙ 01/02/2023

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Driven by improved architectures and better representation learning fram...

0 Sanghyun Woo, et al. ∙

research

∙ 11/23/2022

EurNet: Efficient Multi-Range Relational Modeling of Spatial Multi-Relational Data

Modeling spatial relationship in the data remains critical across many d...

4 Minghao Xu, et al. ∙

research

∙ 10/13/2022

Exploring Long-Sequence Masked Autoencoders

Masked Autoencoding (MAE) has emerged as an effective approach for pre-t...

7 Ronghang Hu, et al. ∙

research

∙ 09/15/2022

Test-Time Training with Masked Autoencoders

Test-time training adapts to a new test distribution on the fly by optim...

4 Yossi Gandelsman, et al. ∙

research

∙ 04/01/2022

On the Importance of Asymmetry for Siamese Representation Learning

Many recent self-supervised frameworks for visual representation learnin...

5 Xiao Wang, et al. ∙

research

∙ 03/10/2022

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval

Dual encoders and cross encoders have been widely used for image-text re...

3 Jie Lei, et al. ∙

research

∙ 02/09/2022

Point-Level Region Contrast for Object Detection Pre-Training

In this work we present point-level region contrast, a self-supervised p...

7 Yutong Bai, et al. ∙

research

∙ 11/22/2021

Benchmarking Detection Transfer Learning with Vision Transformers

Object detection is a central downstream task used to test if pre-traine...

17 Yanghao Li, et al. ∙

research

∙ 11/11/2021

Masked Autoencoders Are Scalable Vision Learners

This paper shows that masked autoencoders (MAE) are scalable self-superv...

45 Kaiming He, et al. ∙

research

∙ 10/11/2021

Towards Demystifying Representation Learning with Non-contrastive Self-supervision

Non-contrastive methods of self-supervised learning (such as BYOL and Si...

2 Xiang Wang, et al. ∙

research

∙ 04/05/2021

An Empirical Study of Training Self-Supervised Vision Transformers

This paper does not describe a novel method. Instead, it studies a strai...

39 Xinlei Chen, et al. ∙

research

∙ 02/12/2021

Understanding self-supervised Learning Dynamics without Contrastive Pairs

Contrastive approaches to self-supervised learning (SSL) learn represent...

9 Yuandong Tian, et al. ∙

research

∙ 11/20/2020

Exploring Simple Siamese Representation Learning

Siamese networks have become a common structure in various recent models...

0 Xinlei Chen, et al. ∙

research

∙ 10/01/2020

Understanding Self-supervised Learning with Dual Deep Networks

We propose a novel theoretical framework to understand self-supervised l...

52 Yuandong Tian, et al. ∙

research

∙ 07/20/2020

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

We introduce a learning-based approach for room navigation using semanti...

2 Medhini Narasimhan, et al. ∙

research

∙ 06/17/2020

Overcoming Statistical Shortcuts for Open-ended Visual Counting

Machine learning models tend to over-rely on statistical shortcuts. Thes...

12 Corentin Dancette, et al. ∙

research

∙ 04/24/2020

Revisiting Modulated Convolutions for Visual Counting and Beyond

This paper targets at visual counting, where the setup is to estimate th...

1 Duy-Kien Nguyen, et al. ∙

research

∙ 03/09/2020

Improved Baselines with Momentum Contrastive Learning

Contrastive unsupervised learning has recently shown encouraging progres...

9 Xinlei Chen, et al. ∙

research

∙ 01/29/2020

ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes

3D object detection has seen quick progress thanks to advances in deep l...

15 Charles R. Qi, et al. ∙

research

∙ 01/10/2020

In Defense of Grid Features for Visual Question Answering

Popularized as 'bottom-up' attention, bounding box (or region) based vis...

7 Huaizu Jiang, et al. ∙

research

∙ 04/18/2019

Towards VQA Models that can Read

Studies have shown that a dominant class of questions asked by visually ...

12 Amanpreet Singh, et al. ∙

research

∙ 04/12/2019

Prior-aware Neural Network for Partially-Supervised Multi-Organ Segmentation

Accurate multi-organ abdominal CT segmentation is essential to many clin...

0 Yuyin Zhou, et al. ∙

research

∙ 04/09/2019

Multi-Target Embodied Question Answering

Embodied Question Answering (EQA) is a relatively new task where an agen...

6 Licheng Yu, et al. ∙

research

∙ 04/09/2019

Embodied Visual Recognition

Passive visual systems typically fail to recognize objects in the amodal...

22 Jianwei Yang, et al. ∙

research

∙ 03/28/2019

TensorMask: A Foundation for Dense Object Segmentation

Sliding-window object detectors that generate bounding-box object predic...

28 Xinlei Chen, et al. ∙

research

∙ 02/15/2019

Cycle-Consistency for Robust Visual Question Answering

Despite significant progress in Visual Question Answering over the years...

14 Meet Shah, et al. ∙

research

∙ 12/29/2018

Relay-Assisted and QoS Aware Scheduling to Overcome Blockage in mmWave Backhaul Networks

In the scenario where small cells are densely deployed, the millimeter w...

0 Yong Niu, et al. ∙

research

∙ 12/20/2018

nocaps: novel object captioning at scale

Image captioning models have achieved impressive results on datasets con...

46 Harsh Agrawal, et al. ∙

research

∙ 12/17/2018

Grounded Video Description

Video description is one of the most challenging problems in vision and ...

8 Luowei Zhou, et al. ∙

research

∙ 07/26/2018

Pythia v0.1: the Winning Entry to the VQA Challenge 2018

This document describes Pythia v0.1, the winning entry from Facebook AI ...

12 Yu Jiang, et al. ∙

research

∙ 03/29/2018

Iterative Visual Reasoning Beyond Convolutions

We present a novel framework for iterative visual reasoning. Our framewo...

0 Xinlei Chen, et al. ∙

research

∙ 12/14/2017

Device-to-Device Communications Enabled Energy Efficient Multicast Scheduling in mmWave Small Cells

To keep pace with the rapid growth of mobile traffic demands, dense depl...

0 Yong Niu, et al. ∙

research

∙ 02/21/2017

PixelNet: Representation of the pixels, by the pixels, and for the pixels

We explore design principles for general pixel-level prediction problems...

0 Aayush Bansal, et al. ∙

research

∙ 02/07/2017

An Implementation of Faster RCNN with Study for Region Sampling

We adapted the join-training scheme of Faster RCNN framework from Caffe ...

0 Xinlei Chen, et al. ∙

research

∙ 09/21/2016

PixelNet: Towards a General Pixel-level Architecture

We explore architectures for general pixel-level prediction problems, fr...

0 Aayush Bansal, et al. ∙

research

∙ 04/14/2016

Learning Visual Storylines with Skipping Recurrent Neural Networks

What does a typical visit to Paris look like? Do people first take photo...

0 Gunnar A. Sigurdsson, et al. ∙

research

∙ 05/07/2015

Webly Supervised Learning of Convolutional Networks

We present an approach to utilize large amounts of web data for learning...

0 Xinlei Chen, et al. ∙

research

∙ 04/01/2015

Microsoft COCO Captions: Data Collection and Evaluation Server

In this paper we describe the Microsoft COCO Caption dataset and evaluat...

0 Xinlei Chen, et al. ∙

research

∙ 11/20/2014

Learning a Recurrent Visual Representation for Image Caption Generation

In this paper we explore the bi-directional mapping between images and t...

0 Xinlei Chen, et al. ∙

Xinlei Chen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro