Dialogue participants may have varying levels of knowledge about the top...
When speakers describe an image, they tend to look at objects before
men...
Dialogue participants often refer to entities or situations repeatedly w...
This paper introduces the PhotoBook dataset, a large-scale collection of...
The multimodal models used in the emerging field at the intersection of
...