This paper introduces ModelScopeT2V, a text-to-video synthesis model tha...
The pursuit of controllability as a higher standard of visual content
cr...
Image captioning models are usually trained according to human annotated...
Urban region function recognition plays a vital character in monitoring ...
Recent image captioning models are achieving impressive results based on...
Any-shot image classification allows to recognize novel classes with onl...
Human-annotated attributes serve as powerful semantic embeddings in zero...
Describing images using natural language is widely known as image captio...
Image classification models have achieved satisfactory performance on ma...
The question answering system can answer questions from various fields a...
Named Entity Recognition (NER) is a challenging task that extracts named...
From the beginning of zero-shot learning research, visual attributes hav...
A wide range of image captioning models has been developed, achieving
si...
In this paper, we introduce Adversarial-and-attention Network (A3Net) fo...