Zero-shot object navigation is a challenging task for home-assistance ro...
We present a novel learning algorithm for trajectory generation for outd...
We present ImageBind-LLM, a multi-modality instruction tuning method of ...
We introduce Point-Bind, a 3D multi-modality model aligning point clouds...
Large language models (LLMs) have revolutionized natural language proces...
Recent advancements in Large Vision-Language Models (LVLMs) have demonst...
Recently, deep learning-based beamforming algorithms have shown promisin...
Large Vision-Language Models (LVLMs) have recently played a dominant rol...
Recently, video object segmentation (VOS) referred by multi-modal signal...
Foundation models have made significant strides in various applications,...
Although Domain Generalization (DG) problem has been fast-growing in the...
Driven by large-data pre-training, Segment Anything Model (SAM) has been...
How to efficiently transform large language models (LLMs) into instructi...
Filter pruning is widely adopted to compress and accelerate the Convolut...
Occlusion between objects is one of the overlooked challenges for object...
The popularity of Contrastive Language-Image Pre-training (CLIP) has
pro...
The recent detection transformer (DETR) has advanced object detection, b...
We present LLaMA-Adapter, a lightweight adaption method to efficiently
f...
In this paper, we investigate representation learning for low-resource
k...
We present a Non-parametric Network for 3D point cloud analysis, Point-N...
Masked Autoencoders (MAE) have been popular paradigms for large-scale vi...
Visual recognition in low-data regimes requires deep neural networks to ...
One-to-one matching is a crucial design in DETR-like object detection
fr...
Binary neural networks (BNNs) have received ever-increasing popularity f...
Despite the increased adoption of open-source cyber threat intelligence
...
Drug-Drug Interactions (DDIs) prediction is an essential issue in the
mo...
Pre-training by numerous image data has become de-facto for robust 2D
re...
Contrastive Language-Image Pre-training (CLIP) has shown promising open-...
Masked Autoencoders (MAE) have been prevailing paradigms for large-scale...
System auditing has emerged as a key approach for monitoring system call...
Contrastive learning has emerged as a powerful tool for graph representa...
The large pre-trained vision transformers (ViTs) have demonstrated remar...
Knowledge distillation (KD) has been proven to be useful for training co...
Few-shot classification requires deep neural networks to learn generaliz...
Binary Neural Networks (BNNs) show great promise for real-world embedded...
Video recognition has been dominated by the end-to-end learning paradigm...
Contrastive Vision-Language Pre-training, known as CLIP, has provided a ...
Unsupervised Domain Adaptation (UDA) aims to adapt the model trained on ...
Challenging illumination conditions (low light, underexposure and
overex...
Large scale pre-training models have been widely used in named entity
re...
Masked Autoencoders (MAE) have shown great potentials in self-supervised...
Mobile communication standards were developed for enhancing transmission...
This paper describes the PASH participation in TREC 2021 Deep Learning T...
Vision Transformers (ViT) become widely-adopted architectures for variou...
Recently, the pre-training paradigm combining Transformer and masked lan...
Keyword spotting (KWS) and speaker verification (SV) are two important t...
Monocular 3D object detection has long been a challenging task in autono...
Anti-cancer drug discoveries have been serendipitous, we sought to prese...
In this paper, we propose a simple and general framework for self-superv...
It is a challenging task to learn discriminative representation from ima...