Robot-assisted surgery has made significant progress, with instrument
se...
Adverse drug reaction (ADR) detection is an essential task in the medica...
Spiking Neural Networks (SNNs) are well known as a promising energy-effi...
The conventional recipe for Automatic Speech Recognition (ASR) models is...
Image-grounded dialogue systems benefit greatly from integrating visual
...
This paper focuses on term-status pair extraction from medical dialogues...
Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promis...
The fast spread of hate speech on social media impacts the Internet
envi...
Spiking Neural Networks (SNNs) provide an energy-efficient deep learning...
Self-assembled InAs/GaAs quantum dots (QDs) have properties highly valua...
The lack of façade structures in photogrammetric mesh models renders the...
Employing additional multimodal information to improve automatic speech
...
Typically, the Time-Delay Neural Network (TDNN) and Transformer can serv...
The Lottery Ticket Hypothesis (LTH) states that a randomly-initialized l...
Adaptive human-agent and agent-agent cooperation are becoming more and m...
The widespread dissemination of toxic online posts is increasingly damag...
Large language models (LLMs) have demonstrated remarkable language abili...
Most urban applications necessitate building footprints in the form of
c...
The accurate representation of 3D building models in urban environments ...
Medical Slot Filling (MSF) task aims to convert medical queries into
str...
The spiking neural network (SNN) using leaky-integrated-and-fire (LIF)
n...
Large-scale pre-trained language models (PLMs) with powerful language
mo...
Learning from the interaction is the primary way biological agents know ...
Performance of trimap-free image matting methods is limited when trying ...
This paper presents an in-depth study on a Sequentially Sampled Chunk
Co...
In multi-talker scenarios such as meetings and conversations, speech
pro...
Network architectures and learning principles are playing key in forming...
Complex nonlinear interplays of multiple scales give rise to many intere...
Benefiting from the event-driven and sparse spiking characteristics of t...
Speaker change detection is an important task in multi-party interaction...
Session-based recommendation aims to predict items that an anonymous use...
Most automatic matting methods try to separate the salient foreground fr...
Visual Dialog is a challenging vision-language task since the visual dia...
The monocular Visual-Inertial Odometry (VIO) based on the direct method ...
Currently, there are mainly three Transformer encoder based streaming En...
In recent years, spiking neural networks (SNNs) have received extensive
...
Most existing CNN-based salient object detection methods can identify lo...
Autonomous exploration is one of the important parts to achieve the
auto...
In the past few years, the emergence of pre-training models has brought
...
Network architectures and learning principles are key in forming complex...
Nowadays, most methods in end-to-end contextual speech recognition bias ...
Deep learning methods are notoriously data-hungry, which requires a larg...
Many real-world scenarios involve a team of agents that have to coordina...
Deep learning based models have significantly improved the performance o...
The past several years have witnessed significant progress in modeling t...
Several video-based 3D pose and shape estimation algorithms have been
pr...
Most existing human matting algorithms tried to separate pure human-only...
Approximate nearest neighbor (ANN) search is a fundamental problem in ar...
Abnormal states in deep reinforcement learning (RL) are states that are
...
Learning to coordinate among multiple agents is an essential problem in
...