While large-scale pre-trained text-to-image models can synthesize divers...
The answering quality of an aligned large language model (LLM) can be
dr...
Existing autoregressive models follow the two-stage generation paradigm ...
Existing vector quantization (VQ) based autoregressive models follow a
t...
Relation prediction on knowledge graphs (KGs) is a key research topic.
D...
In-Context Learning (ICL), which formulates target tasks as prompt compl...
Deep metric learning aims to learn an embedding space, where semanticall...
Scene text spotting is of great importance to the computer vision commun...
Relational triple extraction is challenging for its difficulty in captur...
Chinese spelling check (CSC) is a fundamental NLP task that detects and
...
Text-to-image generation aims at generating realistic images which are
s...
Linguistic knowledge is of great benefit to scene text recognition. Howe...
Entities, as the essential elements in relation extraction tasks, exhibi...
Most Visual Question Answering (VQA) models suffer from the language pri...
Image captioning is a challenging computer vision task, which aims to
ge...
Image-text matching has received growing interest since it bridges visio...
Learning semantic correspondence between image and text is significant a...