Visually-grounded dialog systems, which integrate multiple modes of
comm...
Chinese geographic re-ranking task aims to find the most relevant addres...
Large language models (LLMs) have recently demonstrated remarkable
capab...
Precise, seamless, and efficient train localization as well as long-term...
Large language models (LLMs) have shown impressive ability for open-doma...
Recently, instruction-following Large Language Models (LLMs) , represent...
Training large language models (LLMs) with open-domain instruction data ...
Vision-Language Pre-training (VLP) methods based on object detection enj...
Measuring the quality of responses generated by LLMs is a challenging ta...
With the rapid evolution of large language models (LLMs), there is a gro...
Vision Transformer (ViT) based Vision-Language Pre-training (VLP) models...
Large language models (LLMs) demonstrate remarkable ability to comprehen...
Document understanding refers to automatically extract, analyze and
comp...
Large language models (LLMs) often contain misleading content, emphasizi...
When trying to answer complex questions, people often rely on multiple
s...
Advances in deep generative models shed light on de novo molecule genera...
Entity Linking (EL) is a fundamental task for Information Extraction and...
To promote the development of Vision-Language Pre-training (VLP) and
mul...
Existing multimodal task-oriented dialog data fails to demonstrate the
d...
We introduce NaSGEC, a new dataset to facilitate research on Chinese
gra...
Perceiving multi-modal information and fulfilling dialogues with humans ...
Large language models (LLMs) have exhibited an emergent in-context learn...
Recently, speech-text pre-training methods have shown remarkable success...
The goal of document-grounded dialogue (DocGD) is to generate a response...
Previous studies have revealed that vanilla pre-trained language models
...
Existing knowledge-enhanced methods have achieved remarkable results in
...
Real-world data often have an open long-tailed distribution, and buildin...
Lifelong learning (LL) is an important ability for NLP models to learn n...
With a fast developing pace of geographic applications, automatable and
...
Cross-modal contrastive learning in vision language pretraining (VLP) fa...
The MultiCoNER 2 shared task aims to tackle multilingual named entity
re...
Out-of-Domain (OOD) intent detection is vital for practical dialogue sys...
Out-of-distribution (OOD) detection is essential for the reliable and sa...
We propose to Transform Scene Graphs (TSG) into more descriptive caption...
Large language models (LLMs) have demonstrated impressive zero-shot abil...
Non-AutoRegressive (NAR) text generation models have drawn much attentio...
Recent studies have demonstrated the potential of cross-lingual
transfer...
In this paper, we present ChatPLUG, a Chinese open-domain dialogue syste...
Recent research has shown that Large Language Models (LLMs) can utilize
...
Reinforcement Learning from Human Feedback (RLHF) facilitates the alignm...
Automatic evaluation metrics have been facilitating the rapid developmen...
Vision-and-language multi-modal pretraining and fine-tuning have shown g...
Molecular dynamic simulations are important in computational physics,
ch...
Recent years have witnessed a big convergence of language, vision, and
m...
Table-based reasoning has shown remarkable progress in combining deep mo...
Cross-domain NER is a challenging task to address the low-resource probl...
As a core task in location-based services (LBS) (e.g., navigation maps),...
We design a novel global-local Transformer named Ada-ClustFormer
(ACF) t...
Aligning objects with words plays a critical role in Image-Language BERT...
Video-language pre-training has advanced the performance of various
down...