The extraordinary ability of generative models to generate photographic
...
Deep convolutional neural networks (CNNs) are often of sophisticated des...
Visual transformer has achieved competitive performance on a variety of
...
Existing approaches in video captioning concentrate on exploring global ...
This paper studies feature pyramid network (FPN), which is a widely used...
Describing a video automatically with natural language is a challenging ...
How to learn a discriminative fine-grained representation is a key point...