The goal of speech enhancement (SE) is to eliminate the background
inter...
Edge Intelligence (EI) allows Artificial Intelligence (AI) applications ...
Transformer-based pre-trained language models, such as BERT, achieve gre...
Automatically open-ended long text generation poses significant challeng...
Prior studies diagnose the anisotropy problem in sentence representation...
We present SegGPT, a generalist model for segmenting everything in conte...
Large-scale text-to-image diffusion models achieve unprecedented success...
Meetings are increasingly important for collaborations. Action items in
...
Listening to long video/audio recordings from video conferencing and onl...
ICASSP2023 General Meeting Understanding and Generation Challenge (MUG)
...
Learning on a massive amount of speech corpus leads to the recent succes...
Masked Language Modeling (MLM) is widely used to pretrain language model...
By flexibly manipulating the radio propagation environment, reconfigurab...
Multi-modal and multi-hop question answering aims to answer a question b...
In-context learning, as a new paradigm in NLP, allows the model to rapid...
We launch EVA, a vision-centric foundation model to explore the limits o...
The extraction of sequence patterns from a collection of functionally li...
Generating sound effects that humans want is an important topic. However...
A service robot serving safely and politely needs to track the surroundi...
This paper investigates the use of the reconfigurable dual-functional su...
Fusing regression coefficients into homogenous groups can unveil those
c...
Transformer-based models have achieved great success in various NLP, vis...
This paper tackles the problem of table structure parsing (TSP) from ima...
Detection transformers have recently shown promising object detection re...
Transcripts generated by automatic speech recognition (ASR) systems for
...
Existing classification-based face recognition methods have achieved
rem...
A table arranging data in rows and columns is a very effective data
stru...
Named entity recognition (NER) is a well-studied task in natural languag...
In this paper, we investigate the use of linguistically motivated and
co...
In the traditional cascading architecture for spoken language understand...
Punctuation prediction for automatic speech recognition (ASR) output
tra...
In community-based question answering (CQA) platforms, automatic answer
...
Education has a significant impact on both society and personal life. Wi...
In this letter, we study the secure communication problem in the unmanne...
Video action anticipation aims to predict future action categories from
...
The noetic end-to-end response selection challenge as one track in the 7...
With the increased applications of automatic speech recognition (ASR) in...
Spoken language understanding (SLU) is a key component of task-oriented
...
Session-based target behavior prediction aims to predict the next item t...
Online action detection (OAD) is a practical yet challenging task, which...
In this work, a discriminatively learned CNN embedding is proposed for r...
In this work, a discriminatively learned CNN embedding is proposed for r...
Neural language representation models such as Bidirectional Encoder
Repr...
We propose a learning approach for turn-level spoken language understand...
Deep metric learning aims at learning the distance metric between pair o...
In this paper, we focus on model generalization and adaptation for
cross...
Intent classification and slot filling are two essential tasks for natur...
The noetic end-to-end response selection challenge as one track in Dialo...
Stratifying patients at risk for postoperative complications may facilit...
Temporal action detection aims at not only recognizing action category b...