Joint speech-language training is challenging due to the large demand fo...
Artificial General Intelligence (AGI) requires comprehensive understandi...
The convergence of text, visual, and audio data is a key step towards
hu...
We present Composable Diffusion (CoDi), a novel generative model capable...
The Internet of Production (IoP) leverages concepts such as digital shad...
Large Language Models (LLMs) have shown impressive performance as genera...
We propose MM-REACT, a system paradigm that integrates ChatGPT with a po...
Code-switching speech refers to a means of expression by mixing two or m...
In real application scenarios, it is often challenging to obtain a large...
Knowledge-intensive tasks, such as open-domain question answering (QA),
...
People say, "A picture is worth a thousand words". Then how can we get t...
Semi-supervised learning has shown promise in allowing NLP models to
gen...
Human intelligence is multimodal; we integrate visual, linguistic, and
a...
Recent development of large-scale pre-trained language models (PLM) have...
Vision-language (V+L) pretraining models have achieved great success in
...
In this work, we develop new self-learning techniques with an attention-...
We initiate the first empirical study on the use of MLP architectures fo...
Most of today's AI systems focus on using self-attention mechanisms and
...
Automated visual understanding of our diverse and open world demands com...
Vision-and-language (VL) pre-training has proven to be highly effective ...
Self-supervised learning (SSL) achieves great success in speech recognit...
The advances in attention-based encoder-decoder (AED) networks have brou...
Recently, a trend is emerging toward human-servicing autonomous mobile
r...
Commonsense reasoning (CSR) requires the model to be equipped with gener...
Pre-trained language models (PLMs) aim to learn universal language
repre...
The speech representations learned from large-scale unlabeled data have ...
Current Open-Domain Question Answering (ODQA) model paradigm often conta...
It is often observed in knowledge-centric tasks (e.g., common sense ques...
Spoken Language Understanding (SLU) is composed of two subtasks: intent
...
Modern Automatic Speech Recognition (ASR) systems can achieve high
perfo...
Recently, universal neural machine translation (NMT) with shared
encoder...
End-to-end (E2E) spoken language understanding (SLU) can infer semantics...
In this paper, we propose a unified pre-training approach called UniSpee...
Commonsense reasoning requires a model to make presumptions about world
...
LSTM language models (LSTM-LMs) have been proven to be powerful and yiel...
Cross-lingual Summarization (CLS) aims at producing a summary in the tar...
Spoken language understanding (SLU) requires a model to analyze input
ac...
Knowledge graphs (KGs) contain rich information about world knowledge,
e...
Neural models have become successful at producing abstractive summaries ...
Dialog policy determines the next-step actions for agents and hence is
c...
The training of spoken language understanding (SLU) models often faces t...
Modern Automatic Speech Recognition (ASR) systems can achieve high
perfo...
With the abundance of automatic meeting transcripts, meeting summarizati...
A commonly observed problem with abstractive summarization is the distor...
A commonly observed problem with abstractive summarization is the distor...
As a crucial component in task-oriented dialog systems, the Natural Lang...
Text summarization aims to extract essential information from a piece of...
Lead bias is a common phenomenon in news summarization, where early part...
Dialogue state tracking is an important component in task-oriented dialo...
We describe a system that generates speaker-annotated transcripts of mee...