Desmond Elliott

research

∙ 05/31/2023

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting

Multilingual image captioning has recently been tackled by training with...

0 Rita Ramos, et al. ∙

research

∙ 05/05/2023

Data Curation for Image Captioning with Text-to-Image Generative Models

Recent advances in image captioning are mainly driven by large-scale vis...

4 Wenyan Li, et al. ∙

research

∙ 02/16/2023

Retrieval-augmented Image Captioning

Inspired by retrieval-augmented language generation and pretrained Visio...

1 Rita Ramos, et al. ∙

research

∙ 10/24/2022

Multilingual Multimodal Learning with Machine Translated Text

Most vision-and-language pretraining research focuses on English tasks. ...

2 Chen Qiu, et al. ∙

research

∙ 10/11/2022

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

Non-hierarchical sparse attention Transformer-based models, such as Long...

0 Ilias Chalkidis, et al. ∙

research

∙ 09/30/2022

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation

Recent advances in image captioning have focused on scaling the data and...

11 Rita Ramos, et al. ∙

research

∙ 07/14/2022

Language Modelling with Pixels

Language models are defined over a finite set of inputs, which creates a...

3 Phillip Rust, et al. ∙

research

∙ 04/14/2022

Revisiting Transformer-based Models for Long Document Classification

The recent literature in text classification is biased towards short tex...

0 Xiang Dai, et al. ∙

research

∙ 09/14/2021

MDAPT: Multilingual Domain Adaptive Pretraining in a Single Model

Domain adaptive pretraining, i.e. the continued unsupervised pretraining...

0 Rasmus Kær Jørgensen, et al. ∙

research

∙ 09/09/2021

Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers

Pretrained vision-and-language BERTs aim to learn representations that c...

19 Stella Frank, et al. ∙

research

∙ 01/28/2021

The Role of Syntactic Planning in Compositional Image Captioning

Image captioning has focused on generalizing to images drawn from the sa...

14 Emanuele Bugliarello, et al. ∙

research

∙ 11/30/2020

Multimodal Pretraining Unmasked: Unifying the Vision and Language BERTs

Large-scale pretraining and task-specific fine-tuning is now the standar...

12 Emanuele Bugliarello, et al. ∙

research

∙ 10/16/2020

Multimodal Speech Recognition with Unstructured Audio Masking

Visual context has been shown to be useful for automatic speech recognit...

16 Tejas Srinivasan, et al. ∙

research

∙ 10/06/2020

Textual Supervision for Visually Grounded Spoken Language Understanding

Visually-grounded models of spoken language understanding extract semant...

0 Bertrand Higy, et al. ∙

research

∙ 10/05/2020

Fine-Grained Grounding for Multimodal Speech Recognition

Multimodal automatic speech recognition systems integrate information fr...

0 Tejas Srinivasan, et al. ∙

research

∙ 06/03/2020

CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

Approaches to Grounded Language Learning typically focus on a single tas...

13 Alessandro Suglia, et al. ∙

research

∙ 05/04/2020

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

Large-scale pretrained language models are the major driving force behin...

0 Mostafa Abdou, et al. ∙

research

∙ 11/28/2019

Multimodal Machine Translation through Visuals and Speech

Multimodal machine translation involves drawing information from more th...

0 Umut Sulubacak, et al. ∙

research

∙ 11/09/2019

Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning

Recent work has highlighted the advantage of jointly learning grounded s...

0 Akos Kadar, et al. ∙

research

∙ 09/10/2019

Compositional Generalization in Image Captioning

Image captioning models are usually evaluated on their ability to descri...

0 Mitja Nikolaus, et al. ∙

research

∙ 04/10/2019

Cross-lingual Visual Verb Sense Disambiguation

Recent work has shown that visual context improves cross-lingual sense d...

0 Spandana Gella, et al. ∙

research

∙ 11/01/2018

How2: A Large-scale Dataset for Multimodal Language Understanding

In this paper, we introduce How2, a multimodal collection of instruction...

0 Ramon Sanabria, et al. ∙

research

∙ 09/20/2018

Lessons learned in multilingual grounded language learning

Recent work has shown how to learn better visual-semantic embeddings by ...

0 Akos Kadar, et al. ∙

research

∙ 10/19/2017

Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description

We present the results from the second shared task on multimodal machine...

0 Desmond Elliott, et al. ∙

research

∙ 07/06/2017

Cross-linguistic differences and similarities in image descriptions

Automatic image description systems are commonly trained and evaluated o...

0 Emiel van Miltenburg, et al. ∙

research

∙ 05/11/2017

Imagination improves Multimodal Translation

We decompose multimodal translation into two sub-tasks: learning to tran...

0 Desmond Elliott, et al. ∙

research

∙ 04/13/2017

Room for improvement in automatic image description: an error analysis

In recent years we have seen rapid and significant progress in automatic...

0 Emiel van Miltenburg, et al. ∙

research

∙ 06/20/2016

Pragmatic factors in image description: the case of negations

We provide a qualitative analysis of the descriptions containing negatio...

0 Emiel van Miltenburg, et al. ∙

research

∙ 05/02/2016

Multi30K: Multilingual English-German Image Descriptions

We introduce the Multi30K dataset to stimulate multilingual multimodal r...

0 Desmond Elliott, et al. ∙

research

∙ 01/15/2016

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

Automatic description generation from natural images is a challenging pr...

0 Raffaella Bernardi, et al. ∙

research

∙ 10/15/2015

Multilingual Image Description with Neural Sequence Models

In this paper we present an approach to multi-language image description...

0 Desmond Elliott, et al. ∙

Desmond Elliott

Featured Co-authors

Sign in with Google

Consider DeepAI Pro