Transitioning between topics is a natural component of human-human dialo...
Sequence to Sequence models, in particular the Transformer, achieve stat...
The audio-visual speech fusion strategy AV Align has shown significant
p...
Audio-Visual Speech Recognition (AVSR) seeks to model, and thereby explo...
Automatic speech recognition can potentially benefit from the lip motion...
Finding visual features and suitable models for lipreading tasks that ar...