We aim for accurate and efficient line landmark detection for valet park...
Recently, the development of pre-trained vision language foundation mode...
Pre-trained vision transformers have strong representation benefits to
v...
Current research on cross-modal retrieval is mostly English-oriented, as...
Unsupervised contrastive learning methods have recently seen significant...
Large language models, particularly those akin to the rapidly progressin...
Person Re-identification (ReID) plays a more and more crucial role in re...
Accurate segmentation of punctate white matter lesions (PWMLs) are
funda...
In the contemporary landscape of social media, an alarming number of use...
The transformer-based semantic segmentation approaches, which divide the...
Modern supervised semantic segmentation methods are usually finetuned ba...
Brain tissue segmentation is essential for neuroscience and clinical stu...
Vision Transformers (ViTs) are normally regarded as a stack of transform...
In this work, we investigate extending the comprehension of Multi-modal ...
Vision transformers (ViT) usually extract features via forwarding all th...
Since the advent of Neural Radiance Fields, novel view synthesis has rec...
We study the multilayer random dot product graph (MRDPG) model, an exten...
Visual retrieval tasks such as image retrieval and person re-identificat...
Data-driven medium-range weather forecasting has attracted much attentio...
Meta-learning algorithms are able to learn a new task using previously
l...
With the progress of 3D human pose and shape estimation, state-of-the-ar...
Signal region detection is one of the challenging problems in modern
sta...
Vision Transformers have shown great potential in computer vision tasks....
Localizing people and recognizing their actions from videos is a challen...
This paper introduces Amazon Robotic Manipulation Benchmark (ARMBench), ...
Parameter-Efficient Transfer Learning (PETL) aims at efficiently adaptin...
Recent advances in digitization has led to availability of multivariate ...
Despite the promising results, existing oriented object detection method...
Model attribution is a critical component of deep neural networks (DNNs)...
Modern incremental learning for semantic segmentation methods usually le...
It's a meaningful and attractive topic to build a general and inclusive
...
Existing semantic segmentation works have been mainly focused on designi...
Deploying reliable deep learning techniques in interdisciplinary applica...
The quality of knowledge retrieval is crucial in knowledge-intensive
con...
Channel and spatial attention mechanism has proven to provide an evident...
Vehicle re-identification (Re-ID) is a critical component of the autonom...
Surround-view fisheye perception under valet parking scenes is fundament...
Motion recognition is a promising direction in computer vision, but the
...
Recently, the practical deployment of open-domain dialogue systems has b...
In the driving scene, the road participants usually show frequent intera...
Bionic robots are generally considered to have strong flexibility,
adapt...
Existing pipelined task-oriented dialogue systems usually have difficult...
Vision Transformers (ViTs) have shown promising performance compared wit...
Many open-domain dialogue models pre-trained with social media comments ...
Molecular property prediction is a fundamental task in the drug and mate...
AI-based protein structure prediction pipelines, such as AlphaFold2, hav...
We study the backward compatible problem for person re-identification
(R...
Accurate protein structure prediction can significantly accelerate the
d...
Predicting clinical outcomes to anti-cancer drugs on a personalized basi...
Generative open-domain dialogue systems can benefit from external knowle...