Studies have shown that modern neural networks tend to be poorly calibra...
Word Sense Disambiguation (WSD) is an NLP task aimed at determining the
...
Video-grounded Dialogue (VGD) aims to decode an answer sentence to a que...
Video moment retrieval (VMR) aims to localize target moments in untrimme...
A video-grounded dialogue system referred to as the Structured Co-refere...
Video Moment Retrieval (VMR) is a task to localize the temporal moment i...