Task planning for robotic cooking involves generating a sequence of acti...
Recent 2D-to-3D human pose estimation (HPE) utilizes temporal consistenc...
Picking up multiple objects at once is a grasping skill that makes a hum...
Although the estimation of 3D human pose and shape (HPS) is rapidly
prog...
Heterogeneous Graph Neural Networks (HGNNs) have gained significant
popu...
Many recent efforts aim to augment language models with relevant informa...
Compressed sensing magnetic resonance imaging (CS-MRI) seeks to recover
...
Node classification is a substantial problem in graph-based fraud detect...
There has been a recent surge of interest in introducing transformers to...
In recent years, there has been an increased popularity in image and spe...
Task-agnostic knowledge distillation attempts to address the problem of
...
Software engineers working with the same programming language (PL) may s...
Detecting sarcasm and verbal irony from people's subjective statements i...
This paper describes our winning system on SemEval 2022 Task 7: Identify...
Recent cross-lingual cross-modal works attempt to extend Vision-Language...
Speech representation learning has improved both speech understanding an...
Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has
wid...
Recent progress in diffusion models has revolutionized the popular techn...
Derivative-free prompt learning has emerged as a lightweight alternative...
Embedding-based retrieval (EBR) is a technique to use embeddings to repr...
Recent years have witnessed the rise and success of pre-training techniq...
Recent Vision-Language Pre-trained (VLP) models based on dual encoder ha...
Recent efforts of multimodal Transformers have improved Visually Rich
Do...
Test-time training adapts to a new test distribution on the fly by optim...
We develop WOC, a webcam-based 3D virtual online chatroom for multi-pers...
Named entity recognition (NER) is the task to detect and classify the en...
Flexible task planning continues to pose a difficult challenge for robot...
This paper proposes 12 multi-object grasps (MOGs) types from a human and...
Deep neural networks (DNNs) often rely on massive labelled data for trai...
The ever-growing model size and scale of compute have attracted increasi...
Relational graph neural networks have garnered particular attention to e...
Current face detection algorithms are extremely generalized and can obta...
This paper considers the problem of temporal video interpolation, where ...
The core issue of cyberspace detecting and mapping is to accurately iden...
Double-strand DNA breaks (DSBs) are a form of DNA damage that can cause
...
Conventional methods for the image-text generation tasks mainly tackle t...
Pre-trained language models have achieved state-of-the-art results in va...
Calorie and nutrition research has attained increased interest in recent...
Transferring multiple objects between bins is a common task for many
app...
Given an image with multiple people, our goal is to directly regress the...
Intention decoding is an indispensable procedure in hands-free human-com...
A major component for developing intelligent and autonomous robots is a
...
In recent years, owing to the outstanding performance in graph represent...
A human hand can grasp a desired number of objects at once from a pile b...
Recent 2D-to-3D human pose estimation works tend to utilize the graph
st...
Spectrograms visualize the frequency components of a given signal which ...
This paper discusses recent research progress in robotic grasping and
ma...
Pre-trained models have achieved state-of-the-art results in various Nat...
Pretrained language models (PLMs) such as BERT adopt a training paradigm...
The functional object-oriented network (FOON) has been introduced as a
k...