Joint speech-language training is challenging due to the large demand fo...
Artificial General Intelligence (AGI) requires comprehensive understandi...
The convergence of text, visual, and audio data is a key step towards
hu...
Human intelligence is multimodal; we integrate visual, linguistic, and
a...
Most of today's AI systems focus on using self-attention mechanisms and
...
Automated visual understanding of our diverse and open world demands com...
With the recent surge of video conferencing tools usage, providing
high-...
Personalized speech enhancement (PSE) models utilize additional cues, su...
In this paper, we propose a unified pre-training approach called UniSpee...
Commonsense reasoning requires a model to make presumptions about world
...
Cross-lingual Summarization (CLS) aims at producing a summary in the tar...
Neural models have become successful at producing abstractive summaries ...
With the abundance of automatic meeting transcripts, meeting summarizati...
A commonly observed problem with abstractive summarization is the distor...
A commonly observed problem with abstractive summarization is the distor...
Text summarization aims to extract essential information from a piece of...
Lead bias is a common phenomenon in news summarization, where early part...
This paper describes a system that generates speaker-annotated transcrip...
Dialogue state tracking is an important component in task-oriented dialo...
We describe a system that generates speaker-annotated transcripts of mee...
Conversational question answering (CQA) is a novel QA task that requires...
Machine translation has made rapid advances in recent years. Millions of...