Continual learning (CL) has attracted increasing attention in the recent...
Research in scene graph generation (SGG) usually considers two-stage mod...
Large visual-language models, like CLIP, learn generalized representatio...
Do video-text transformers learn to model temporal relationships across
...
The problem of continual learning has attracted rising attention in rece...
The problem of class incremental learning (CIL) is considered.
State-of-...
The problem of realistic VQA (RVQA), where a model has to reject unanswe...
The problem of adversarial defenses for image classification, where the ...
We present YORO - a multi-modal transformer encoder-only architecture fo...
The complexity-precision trade-off of an object detector is a critical
p...
A need to understand and predict vehicles' behavior underlies both publi...
Designing better machine translation systems by considering auxiliary in...
Class-incremental learning (CIL) has been widely studied under the setti...
We consider the problem of omni-supervised object detection, which can u...
Social distancing, an essential public health measure to limit the sprea...
The hypothesis that image datasets gathered online "in the wild" can pro...
Much recent progress has been made in reconstructing the 3D shape of an
...
Significant effort has been recently devoted to modeling visual relation...
This paper aims at addressing the problem of substantial performance
deg...
Main challenges in long-tailed recognition come from the imbalanced data...
The problem of long-tailed recognition, where the number of examples per...
The problem of long-tailed recognition, where the number of examples per...
We propose a method to learn, even using a dataset where objects appear ...
We introduce an inversion based method, denoted as IMAge-Guided model
IN...
Extensive research in neural style transfer methods has shown that the
c...
We present a self-supervised learning method to learn audio and video
re...
Recent works of multi-source domain adaptation focus on learning a
domai...
Recent research in dynamic convolution shows substantial performance boo...
In this paper, we present MicroNet, which is an efficient convolutional
...
We introduce a novel self-supervised pretext task for learning
represent...
Contrastive learning (CL) is a popular technique for self-supervised lea...
Image hash codes are produced by binarizing the embeddings of convolutio...
Long-tail recognition tackles the natural non-uniformly distributed data...
The problem of open-set recognition is considered. While previous approa...
We present a self-supervised learning approach to learn audio-visual
rep...
The problem of counterfactual visual explanations is considered. A new f...
Low-precision networks, with weights and activations quantized to low
bi...
A volumetric attention(VA) module for 3D medical image segmentation and
...
Multiview recognition has been well studied in the literature and achiev...
A new paradigm is proposed for autonomous driving. The new paradigm lies...
We propose a robust spectrum sensing framework based on deep learning. T...
Real-world applications of object recognition often require the solution...
The problem of multi-domain learning of deep networks is considered. An
...
In object detection, the intersection over union (IoU) threshold is
freq...
The transfer of a neural network (CNN) trained to recognize objects to t...
Domain adaptation for semantic image segmentation is very necessary sinc...
Modern machine learning datasets can have biases for certain representat...
Despite increasing efforts on universal representations for visual
recog...
One major branch of saliency object detection methods is diffusion-based...
We introduce an approach to convert mono audio recorded by a 360 video c...