We propose DocFormerv2, a multi-modal transformer for Visual Document
Un...
We present YORO - a multi-modal transformer encoder-only architecture fo...
Data augmentation is a necessity to enhance data efficiency in deep lear...
Memorization of the relation between entities in a dataset can lead to
p...
We propose a novel multimodal architecture for Scene Text Visual Questio...
We present DocFormer – a multi-modal transformer based architecture for ...
Self-supervised representation learning has seen remarkable progress in ...
We propose a new end-to-end trainable model for lossy image compression ...
Deep metric learning (DML) is a popular approach for images retrieval,
s...
Recently, there has been much interest in deep learning techniques to do...
Several deep learned lossy compression techniques have been proposed in ...
Logo recognition is the task of identifying and classifying logos. Logo
...
Image similarity involves fetching similar looking images given a refere...