Large-scale foundation models, such as CLIP, have demonstrated remarkabl...
Although deep learning models have shown impressive performance on super...
In recent years, the success of large-scale vision-language models (VLMs...
Large-scale foundation models (e.g., CLIP) have shown promising zero-sho...