Tinne Tuytelaars
Professor at KU Leuven
In this paper, we investigate the task of zero-shot human-object interac...
Neural Radiance Fields (NeRFs) have revolutionized the field of novel vi...
The goal of News Image Captioning is to generate an image caption accord...
The focal point of egocentric video understanding is modelling hand-obje...
During training, supervised object detection tries to correctly match th...
Many two-stage instance segmentation heads predict a coarse 28x28 mask p...
The intrinsic difficulty in adapting deep learning models to non-station...
A personalized KeyWord Spotting (KWS) pipeline typically requires the
tr...
Class-incremental learning (CIL) is a particularly challenging variant o...
The success of the Neural Radiance Fields (NeRFs) for modeling and free-...
By default, neural networks learn on all training data at once. When suc...
Most self-supervised methods for representation learning leverage a
cros...
Learning dense visual representations without labels is an arduous task ...
Human object interaction (HOI) detection plays a crucial role in
human-c...
In this work, we study the problem of Embodied Referring Expression
Grou...
This work evaluates and analyzes the combination of imitation learning (...
Vision-language pre-training (VLP) has attracted increasing attention
re...
We revisit the weakly supervised cross-modal face-name alignment task; t...
The focal point of egocentric video understanding is modelling hand-obje...
In this paper we describe the design and the ideas motivating a new Cont...
Recently, two-stage Deformable DETR introduced the query-based two-stage...
Pose estimation is usually tackled as either a bin classification proble...
The downstream accuracy of self-supervised methods is tightly linked to ...
To ensure user acceptance of autonomous vehicles (AVs), control systems ...
Introducing a time dependency on the data generating distribution has pr...
Pre-trained models are nowadays a fundamental component of machine learn...
Training models continually to detect and classify objects, from new cla...
As natural images usually contain multiple objects, multi-label image
cl...
In the online continual learning paradigm, agents must learn from a chan...
Visual question answering is a vision-and-language multimodal task, that...
This paper attacks the problem of language-guided navigation in a new
pe...
The gap between simulation and the real-world restrains many machine lea...
Recognizing human actions is fundamentally a spatio-temporal reasoning
p...
Feature pyramids have become ubiquitous in multi-scale computer vision t...
Active visual exploration aims to assist an agent with a limited field o...
In this paper we propose BlockCopy, a scheme that accelerates pretrained...
One of the most common problems of weakly supervised object localization...
Explainable AI (XAI) methods focus on explaining what a neural network h...
Learning from non-stationary data streams and overcoming catastrophic
fo...
We study the online continual learning paradigm, where agents must learn...
SegBlocks reduces the computational cost of existing neural networks, by...
We present a method for adversarial attack detection based on the inspec...
In this paper, we consider the problem of fine-grained image retrieval i...
We present a new framework for self-supervised representation learning b...
Training deep learning models on embedded devices is typically avoided s...
The task of visual grounding requires locating the most relevant region ...
Given a really low-resolution input image of a face (say 16x16 or 8x8
pi...
As learning from non-stationary streams of data has been proven a challe...
The determination of the relative 6 Degree of Freedom (DoF) pose of vehi...
In a dynamic environment, an agent with a limited field of view/resource...