Vision and Language Models (VLMs), such as CLIP, have enabled visual
rec...
Recently, large-scale pre-trained Vision and Language (VL) models have s...
We propose MATE, the first Test-Time-Training (TTT) method designed for ...
Domain adaptation is crucial to adapt a learned model to new scenarios, ...