Relative positional encoding is widely used in vanilla and linear
transf...
Subject-driven text-to-image generation models create novel renditions o...
General-purpose language models that can solve various language-domain t...
Sequence modeling has important applications in natural language process...
Semantic communications are expected to accomplish various semantic task...
The cost of vision-and-language pre-training has become increasingly
pro...
Large language models (LLMs) have demonstrated excellent zero-shot
gener...
Zero-shot relation triplet extraction (ZeroRTE) aims to extract relation...
Linear transformers aim to reduce the quadratic space-time complexity of...
Transparent objects are widely used in industrial automation and daily l...
We introduce LAVIS, an open-source deep learning library for LAnguage-VI...
Transformer has shown great successes in natural language processing,
co...
Vision-Language Pre-training (VLP) has advanced the performance for many...
This work studies the task of glossification, of which the aim is to em
...
Video-and-language pre-training has shown promising improvements on vari...
Recent deep face hallucination methods show stunning performance in
supe...
Video deraining is an important task in computer vision as the unwanted ...
Rain streaks and rain drops are two natural phenomena, which degrade ima...
Video deblurring models exploit consecutive frames to remove blurs from
...
Sign language translation (SLT) aims to interpret sign video sequences i...
Word-level sign language recognition (WSLR) is a fundamental task in sig...
Vision-based sign language recognition aims at helping the hearing-impai...