In this paper, we for the first time explore helpful multi-modal context...
Multimodal Large Language Model (MLLM) relies on the powerful LLM to per...
We introduce MQ-Det, an efficient architecture and pre-training strategy...
Open-vocabulary object detection (OVD) aims to scale up vocabulary size ...
Vision transformers (ViTs) are changing the landscape of object detectio...
This paper proposes an Any-time super-Resolution Method (ARM) to tackle ...
Vision transformers have recently received explosive popularity, but the...
Conventional semi-supervised learning (SSL) methods, e.g., MixMatch, ach...