Large visual-language models, like CLIP, learn generalized representatio...
The problem of realistic VQA (RVQA), where a model has to reject unanswe...
The problem of adversarial defenses for image classification, where the ...
We present YORO - a multi-modal transformer encoder-only architecture fo...
Understanding the NAND flash memory channel has become more and more
cha...
The hypothesis that image datasets gathered online "in the wild" can pro...
Much recent progress has been made in reconstructing the 3D shape of an
...
Contrastive learning (CL) is a popular technique for self-supervised lea...
Long-tail recognition tackles the natural non-uniformly distributed data...
Multiview recognition has been well studied in the literature and achiev...
Camera calibration is a crucial prerequisite in many applications of com...