BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling

12/03/2020
by   Jing Su, et al.
0

Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM. Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of our method with human evaluation.

READ FULL TEXT

page 5

page 12

research
09/26/2019

A Hierarchical Approach for Visual Storytelling Using Image Description

One of the primary challenges of visual storytelling is developing techn...
research
12/02/2020

Generating Descriptions for Sequential Images with Local-Object Attention and Global Semantic Context Modelling

In this paper, we propose an end-to-end CNN-LSTM model for generating de...
research
05/31/2020

"Judge me by my size (noun), do you?” YodaLib: A Demographic-Aware Humor Generation Framework

The subjective nature of humor makes computerized humor generation a cha...
research
09/05/2019

Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation

Recently, automatic image caption generation has been an important focus...
research
04/28/2021

MelBERT: Metaphor Detection via Contextualized Late Interaction using Metaphorical Identification Theories

Automated metaphor detection is a challenging task to identify metaphori...
research
10/13/2016

Video Fill in the Blank with Merging LSTMs

Given a video and its incomplete textural description with missing words...
research
10/10/2016

Neural Paraphrase Generation with Stacked Residual LSTM Networks

In this paper, we propose a novel neural approach for paraphrase generat...

Please sign up or login with your details

Forgot password? Click here to reset