Scene Text Recognition (STR) models have achieved high performance in re...
Despite recent progress in video and language representation learning, t...
In traditional Visual Question Generation (VQG), most images have multip...
Data augmentation is an approach that can effectively improve the perfor...
Reinforcement Learning (RL) is a powerful framework to address the
discr...
Conventional models for Visual Question Answering (VQA) explore determin...
The existing action tubelet detectors mainly depend on heuristic anchor ...
A major obstacle to the development of Natural Language Processing (NLP)...