Integrating vision and language has gained notable attention following t...
Speech summarization is typically performed by using a cascade of speech...
Sign Language is the primary means of communication for the majority of ...
Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are...
In this paper, we study abstractive summarization for open-domain videos...
End-to-end acoustic-to-word speech recognition models have recently gain...
An increasing number of datasets contain multiple views, such as video, ...
Humans are capable of processing speech by making use of multiple sensor...
In this paper, we introduce How2, a multimodal collection of instruction...
Acoustic-to-Word recognition provides a straightforward solution to
end-...
Transcription or sub-titling of open-domain videos is still a challengin...
We summarize the accomplishments of a multi-disciplinary workshop explor...
There is a great need for technologies that can predict the mortality of...