Patch-Level Contrasting without Patch Correspondence for Accurate and Dense Contrastive Representation Learning
We propose ADCLR: A ccurate and D ense Contrastive Representation Learning, a novel self-supervised learning framework for learning accurate and dense vision representation. To extract spatial-sensitive information, ADCLR introduces query patches for contrasting in addition with global contrasting. Compared with previous dense contrasting methods, ADCLR mainly enjoys three merits: i) achieving both global-discriminative and spatial-sensitive representation, ii) model-efficient (no extra parameters in addition to the global contrasting baseline), and iii) correspondence-free and thus simpler to implement. Our approach achieves new state-of-the-art performance for contrastive methods. On classification tasks, for ViT-S, ADCLR achieves 77.5 ImageNet with linear probing, outperforming our baseline (DINO) without our devised techniques as plug-in, by 0.5 accuracy on ImageNet by linear probing and finetune, outperforming iBOT by 0.3 improvements of 44.3 segmentation, outperforming previous SOTA method SelfPatch by 2.2 respectively. On ADE20K, ADCLR outperforms SelfPatch by 1.0 the segme
READ FULL TEXT