Large Language Models (LLMs) have the capacity of performing complex
sch...
Large language models (LLMs), such as ChatGPT, are able to generate
huma...
We show that Vision-Language Transformers can be learned without human l...
The primary focus of recent work with largescale transformers has been o...
Generating formal-language represented by relational tuples, such as Lis...
Popular metrics used for evaluating image captioning systems, such as BL...
This paper presents a new metric called TIGEr for the automatic evaluati...
In this paper, we propose Object-driven Attentive Generative Adversarial...
Vision-language navigation (VLN) is the task of navigating an embodied a...
We propose a hierarchically structured reinforcement learning approach t...
We study in this paper the problems of both image captioning and
text-to...
We study how to generate captions that are not only accurate in describi...
This paper proposes a new architecture - Attentive Tensor Product Learni...
While deep learning has pushed the boundaries in various machine learnin...
In this paper, we propose an Attentional Generative Adversarial Network
...
Deep learning (DL) has in recent years been widely used in natural langu...
We present a new approach to the design of deep networks for natural lan...
We present a new tensor product generation network (TPGN) that generates...