Multi-modal large language models (MLLMs) are trained based on large lan...
Modern hierarchical vision transformers have added several vision-specif...
There has been a longstanding belief that generation can facilitate a tr...
Synthetic data has emerged as a promising source for 3D human research a...
This paper presents a simple and effective visual prompting method for
a...
Video-language pre-training is crucial for learning powerful multi-modal...
This paper studies the potential of distilling knowledge from pre-traine...
The Resolution of feature maps is critical for medical image segmentatio...
Image pre-training, the current de-facto paradigm for a wide range of vi...
Existing commonsense knowledge bases often organize tuples in an isolate...
Recent advances in self-supervised contrastive learning yield good
image...
We present Masked Feature Prediction (MaskFeat) for self-supervised
pre-...
Electroencephalogram (EEG) recordings are often contaminated with artifa...
The phase function is a key element of a light propagation model for Mon...
The success of language Transformers is primarily attributed to the pret...
Semi-supervised learning on class-imbalanced data, although a realistic
...
EEG source localization is an important technical issue in EEG analysis....
A simile is a figure of speech that directly makes a comparison, showing...
Recently proposed neural architecture search (NAS) algorithms adopt neur...
Contrastive learning has been adopted as a core method for unsupervised
...
To model diverse responses for a given post, one promising way is to
int...
Neural architecture search (NAS) is a promising method for automatically...
Learning visual features from unlabeled image data is an important yet
c...
Computer vision is difficult, partly because the mathematical function
c...
Retinex model is an effective tool for low-light image enhancement. It
a...