Large vision-language models (LVLMs) have recently witnessed rapid
advan...
We introduce the Qwen-VL series, a set of large-scale vision-language mo...
In this work, we explore a scalable way for building a general represent...
Generalist models, which are capable of performing diverse multi-modal t...
Generative modeling of human motion has broad applications in computer
a...
Virtual try-on aims to generate a photo-realistic fitting result given a...
The fashion industry has diverse applications in multi-modal image gener...
In this work, we pursue a unified paradigm for multimodal pretraining to...
Vehicle search is one basic task for the efficient traffic management in...
Conventional deep learning based methods for object detection require a ...
Recent works have made great progress in semantic segmentation by exploi...
For visual tracking, most of the traditional correlation filters (CF) ba...