We propose a self-supervised method for learning representations based o...
The goal in episodic memory (EM) is to search a long egocentric video to...
Despite the advancement of machine learning techniques in recent years,
...
Searching long egocentric videos with natural language queries (NLQ) has...
Room impulse response (RIR) functions capture how the surrounding physic...
In reinforcement learning for visual navigation, it is common to develop...
We explore active audio-visual separation for dynamic sound sources, whe...
State-of-the-art approaches to ObjectGoal navigation rely on reinforceme...
We introduce the active audio-visual source separation problem, where an...
We introduce environment predictive coding, a self-supervised approach t...
Recent work on audio-visual navigation assumes a constantly-sounding tar...
The evolution of clothing styles and their migration across the world is...
In audio-visual navigation, an agent intelligently travels through a com...
State-of-the-art navigation methods leverage a spatial memory to general...
Several animal species (e.g., bats, dolphins, and whales) and even visua...
The evolution of clothing styles and their migration across the world is...
Moving around in the world is naturally a multisensory experience, but
t...
Due to the lack of large-scale datasets, the prevailing approach in visu...
Image retrieval is one of the most popular tasks in computer vision. How...
Novelty detection is crucial for real-life applications. While it is com...
What is the future of fashion? Tackling this question from a data-driven...
The Earth Mover's Distance (EMD) computes the optimal cost of transformi...
Collecting training images for all visual categories is not only expensi...
Attribute based knowledge transfer has proven very successful in visual
...