The input tokens to Vision Transformers carry little semantic meaning as...
Unsupervised representation learning for speech processing has matured
g...
Spoken language understanding (SLU) tasks are usually solved by first
tr...
The two most common paradigms for end-to-end speech recognition are
conn...
Recent advances in self-supervised learning through contrastive training...
We address a challenging and practical task of labeling questions in spe...