Hao Tan

research

∙ 07/24/2023

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

Punctuation restoration is an important task in automatic speech recogni...

0 Viet Dac Lai, et al. ∙

research

∙ 07/23/2023

Learning Navigational Visual Representations with Semantic Map Supervision

Being able to perceive the semantics and the spatial structure of the en...

0 Yicong Hong, et al. ∙

research

∙ 06/09/2023

DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

Vision-language pretraining models have achieved great success in suppor...

0 Fuxiao Liu, et al. ∙

research

∙ 05/19/2023

Graph Propagation Transformer for Graph Representation Learning

This paper presents a novel transformer architecture for graph represent...

0 Zhe Chen, et al. ∙

research

∙ 07/05/2022

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

Vision-and-Language Navigation (VLN) tasks require an agent to navigate ...

4 Jialu Li, et al. ∙

research

∙ 03/29/2022

EnvEdit: Environment Editing for Vision-and-Language Navigation

In Vision-and-Language Navigation (VLN), an agent needs to navigate thro...

7 Jialu Li, et al. ∙

research

∙ 07/13/2021

How Much Can CLIP Benefit Vision-and-Language Tasks?

Most existing Vision-and-Language (V L) models rely on pre-trained vis...

7 Sheng Shen, et al. ∙

research

∙ 07/06/2021

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

Since visual perception can give rich information beyond text descriptio...

3 Zineng Tang, et al. ∙

research

∙ 06/21/2021

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Video understanding relies on perceiving the global content and modeling...

8 Hao Tan, et al. ∙

research

∙ 04/19/2021

Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information

Vision language navigation is the task that requires an agent to navigat...

10 Jialu Li, et al. ∙

research

∙ 02/04/2021

Unifying Vision-and-Language Tasks via Text Generation

Existing methods for vision-and-language learning typically require desi...

38 Jaemin Cho, et al. ∙

research

∙ 11/15/2020

ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments

For embodied agents, navigation is an important ability but not an isola...

3 Hyounghun Kim, et al. ∙

research

∙ 10/14/2020

Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Humans learn language by listening, speaking, writing, reading, and also...

3 Hao Tan, et al. ∙

research

∙ 10/12/2020

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding

Phrase localization is a task that studies the mapping from textual phra...

1 Qinxin Wang, et al. ∙

research

∙ 09/14/2020

RelativeNAS: Relative Neural Architecture Search via Slow-Fast Learning

Despite the remarkable successes of Convolutional Neural Networks (CNNs)...

14 Hao Tan, et al. ∙

research

∙ 05/06/2020

Diagnosing the Environment Bias in Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) requires an agent to follow natural...

4 Yubo Zhang, et al. ∙

research

∙ 04/28/2020

The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions

We find that the performance of state-of-the-art models on Natural Langu...

0 Xiang Zhou, et al. ∙

research

∙ 01/17/2020

Modality-Balanced Models for Visual Dialogue

The Visual Dialog task requires a model to exploit both image and conver...

22 Hyounghun Kim, et al. ∙

research

∙ 08/20/2019

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

Vision-and-language reasoning requires an understanding of visual concep...

0 Hao Tan, et al. ∙

research

∙ 06/18/2019

Expressing Visual Relationships via Language

Describing images with text is a fundamental problem in vision-language ...

8 Hao Tan, et al. ∙

research

∙ 04/29/2019

Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning

Enabling robots to understand instructions provided via spoken natural l...

0 Haonan Chen, et al. ∙

research

∙ 04/08/2019

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

A grand goal in AI is to build a robot that can accurately navigate base...

0 Hao Tan, et al. ∙

research

∙ 03/07/2019

Shallow Overlay Trees Suffice for High-Throughput Consensus

All-to-all data transmission is a typical data transmission pattern in b...

0 Hao Tan, et al. ∙

research

∙ 04/18/2018

Object Ordering with Bidirectional Matchings for Visual Reasoning

Visual reasoning with compositional natural language instructions, e.g.,...

0 Hao Tan, et al. ∙

research

∙ 07/12/2017

Source-Target Inference Models for Spatial Instruction Understanding

Models that can execute natural language instructions for situated robot...

0 Hao Tan, et al. ∙

research

∙ 12/30/2016

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Referring expressions are natural language constructions used to identif...

0 Licheng Yu, et al. ∙

Hao Tan

Featured Co-authors

Sign in with Google

Consider DeepAI Pro