Recent advances in semi-supervised semantic segmentation have been heavi...
Audio-visual navigation is an audio-targeted wayfinding task where a rob...
VLN-CE is a recently released embodied task, where AI agents need to nav...
Vision-language navigation (VLN), which entails an agent to navigate 3D
...
Point cloud analysis (such as 3D segmentation and detection) is a challe...
As the size of transformer-based models continues to grow, fine-tuning t...
We introduce a novel speaker model Kefa for navigation instruction
gener...
This report presents a framework called Segment And Track Anything (SAMT...
We present CLUSTSEG, a general, transformer-based framework that tackles...
Current top-leading solutions for video object segmentation (VOS) typica...
The objective of this paper is self-supervised learning of video object
...
Recently, visual-language navigation (VLN) – entailing robot agents to
f...
With the emergence of varied visual navigation tasks (e.g,
image-/object...
Neural-symbolic computing (NeSy), which pursues the integration of the
s...
Prevalent semantic segmentation solutions are, in essence, a dense
discr...
Prevalent state-of-the-art instance segmentation methods fall into a
que...
We devise deep nearest centroids (DNC), a conceptually elegant yet
surpr...
Dominated point cloud-based 3D object detectors in autonomous driving
sc...
Existing approaches for unsupervised point cloud pre-training are constr...
Reference-based image super-resolution (RefSR) aims to exploit auxiliary...
Vision-language navigation is the task of directing an embodied agent to...
Since the rise of vision-language navigation (VLN), great progress has b...
Prevalent semantic segmentation solutions, despite their different netwo...
Humans are able to recognize structured relations in observation, allowi...
Our target is to learn visual correspondence from unlabeled videos. We
d...
Abductive reasoning seeks the likeliest possible explanation for partial...
We explore the task of language-guided video segmentation (LVS). Previou...
Traditional domain adaptation addresses the task of adapting a model to ...
Video segmentation, i.e., partitioning video frames into multiple segmen...
As a fundamental problem for Artificial Intelligence, multi-agent system...
Referring video object segmentation (RVOS) aims to segment video objects...
Language-queried video actor segmentation aims to predict the pixel-leve...
On existing public benchmarks, face forgery detection techniques have
ac...
To address the challenging task of instance-aware human part parsing, a ...
Recently, numerous algorithms have been developed to tackle the problem ...
Current semantic segmentation methods focus only on mining "local" conte...
Learning from imperfect data becomes an issue in many industrial applica...
It is laborious to manually label point cloud data for training high-qua...
Vision-language navigation (VLN) is the task of entailing an agent to ca...
How to make a segmentation model to efficiently adapt to a specific vide...
This paper studies the problem of learning semantic segmentation from
im...
Current popular online multi-object tracking (MOT) solutions apply singl...
We propose a new method for video object segmentation (VOS) that address...
Human parsing is for pixel-wise human semantic understanding. As human b...
Rapid progress has been witnessed for human-object interaction (HOI)
rec...
This paper proposes a human-aware deblurring model that disentangles the...
We introduce a novel network, called CO-attention Siamese Network (COSNe...
This work proposes a novel attentive graph neural network (AGNN) for
zer...
This work proposes to combine neural networks with the compositional
hie...
This paper addresses a new problem of understanding human gaze communica...