Peihao Chen

research

∙ 08/15/2023

A^2Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

We study the task of zero-shot vision-and-language navigation (ZS-VLN), ...

0 Peihao Chen, et al. ∙

research

∙ 07/24/2023

3D-LLM: Injecting the 3D World into Large Language Models

Large language models (LLMs) and Vision-Language Models (VLMs) have been...

0 Yining Hong, et al. ∙

research

∙ 07/22/2023

Learning Vision-and-Language Navigation from YouTube Videos

Vision-and-language navigation (VLN) requires an embodied agent to navig...

0 Kunyang Lin, et al. ∙

research

∙ 07/20/2023

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

This paper presents a paradigm that adapts general large-scale pretraine...

0 Weidong Chen, et al. ∙

research

∙ 03/21/2023

Detecting the open-world objects with the help of the Brain

Open World Object Detection (OWOD) is a novel computer vision task with ...

0 Shuailei Ma, et al. ∙

research

∙ 10/14/2022

Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

We address a practical yet challenging problem of training robot agents ...

0 Peihao Chen, et al. ∙

research

∙ 10/14/2022

Learning Active Camera for Multi-Object Navigation

Getting robots to navigate to multiple objects autonomously is essential...

0 Peihao Chen, et al. ∙

research

∙ 10/12/2022

M^3Video: Masked Motion Modeling for Self-Supervised Video Representation Learning

We study self-supervised video representation learning that seeks to lea...

0 Xinyu Sun, et al. ∙

research

∙ 08/07/2020

Location-aware Graph Convolutional Networks for Video Question Answering

We addressed the challenging task of video question answering, which req...

0 Deng Huang, et al. ∙

research

∙ 07/21/2020

Foley Music: Learning to Generate Music from Videos

In this paper, we introduce Foley Music, a system that can synthesize pl...

0 Chuang Gan, et al. ∙

research

∙ 07/14/2020

Generating Visually Aligned Sound from Videos

We focus on the task of generating sound from natural videos, and the so...

0 Peihao Chen, et al. ∙

research

∙ 04/07/2020

Dense Regression Network for Video Grounding

We address the problem of video grounding from natural language queries....

0 Runhao Zeng, et al. ∙

research

∙ 10/25/2019

Self-supervised Moving Vehicle Tracking with Stereo Sound

Humans are able to localize objects in the environment using both visual...

10 Chuang Gan, et al. ∙

Peihao Chen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro