The pre-training task is indispensable for the text-to-image person
re-i...
Ultrasound (US) image segmentation is an active research area that requi...
Motivated by numerically modeling surface waves for inviscid Euler equat...
The challenge posed by multimodal named entity recognition (MNER) is mai...
Image-grounded dialogue systems benefit greatly from integrating visual
...
Stroke extraction of Chinese characters plays an important role in the f...
One of the mainstream schemes for 2D human pose estimation (HPE) is lear...
Deep learning (DL) has proven highly effective for ultrasound-based
comp...
Deep classifiers may encounter significant performance degradation when
...
Diffusion Probabilistic Models (DPMs) have recently shown remarkable
per...
Medical dialogue systems (MDS) aim to provide patients with medical serv...
Goal-directed dialogue systems aim to proactively reach a pre-determined...
Artificial Intelligence Generated Content (AIGC) has garnered considerab...
The In-Context Learning (ICL) is to understand a new task via a few
demo...
This paper tackles spectral reflectance recovery (SRR) from RGB images. ...
Existing methods of multi-person video 3D human Pose and Shape Estimatio...
Deep Learning (DL) and Deep Neural Networks (DNNs) are widely used in va...
This paper presents a novel predictive model, MetaMorph, for metamorphic...
Deep Learning (DL) is being applied in various domains, especially in
sa...
Close-up facial images captured at close distances often suffer from
per...
Molecular dynamics is the primary computational method by which modern
s...
Multi-agent reinforcement learning (MARL) has achieved great progress in...
Over these years, multi-agent reinforcement learning has achieved remark...
Communication in multi-agent reinforcement learning has been drawing
att...
As the number of open and shared scientific datasets on the Internet
inc...
In the field of skeleton-based action recognition, current top-performin...
Constructing click models and extracting implicit relevance feedback
inf...
Egocentric 3D human pose estimation with a single head-mounted fisheye c...
Conversational recommender systems (CRS) aim to employ natural language
...
It is a common sense that datasets with high-quality data samples play a...
We present a robust, privacy-preserving visual localization algorithm us...
Extracting class activation maps (CAM) is a key step for weakly-supervis...
Masked image modeling (MIM) learns visual representation by masking and
...
We present a strong object detector with encoder-decoder pretraining and...
With the advent of the electric power big data era, semantic interoperab...
Deformable shapes provide important and complex geometric features of ob...
High resolution and advanced semantic representation are both vital for ...
Recently, transformer-based networks have shown impressive results in
se...
Deploying ultra-dense networks that operate on millimeter wave (mmWave) ...
Flocking control is a significant problem in multi-agent systems such as...
Flocking control is a challenging problem, where multiple agents, such a...
In this paper, we study the problem of one-shot skeleton-based action
re...
Recommendation dialogue systems aim to build social bonds with users and...
Video 3D human pose estimation aims to localize the 3D coordinates of hu...
We present UnrealEgo, i.e., a new large-scale naturalistic dataset for
e...
Flash illumination is widely used in imaging under low-light environment...
Multi-person 3D pose estimation is a challenging task because of occlusi...
This paper proposes a novel Unified Feature Optimization (UFO) paradigm ...
Text-to-image person re-identification (ReID) aims to search for pedestr...
Localization and navigation are basic robotic tasks requiring an accurat...