We propose a novel end-to-end document understanding model called SeRum
...
Video temporal character grouping locates appearing moments of major
cha...
We exploit the potential of the large-scale Contrastive Language-Image
P...
We propose MemoChat, a pipeline for refining instructions that enables l...
Temporal sentence grounding (TSG) aims to locate a specific moment from ...
Image retrieval targets to find images from a database that are visually...
Multimodal Large Language Model (MLLM) recently has been a new rising
re...
Multimodal Large Language Model (MLLM) relies on the powerful LLM to per...
Text recognition in the wild is a long-standing problem in computer visi...
Structured text extraction is one of the most valuable and challenging
a...
Recently, Table Structure Recognition (TSR) task, aiming at identifying ...
Co-salient object detection (Co-SOD) aims at discovering the common obje...
Vision transformers (ViTs) are changing the landscape of object detectio...
Recently, Vision Transformer (ViT) has achieved remarkable success in se...
Batch normalization (BN) is widely used in modern deep neural networks, ...
Although residual connection enables training very deep neural networks,...
Personalized video highlight detection aims to shorten a long video to
i...
In this paper, we focus on recognizing 3D shapes from arbitrary views, i...
Vision transformers have recently received explosive popularity, but the...
The Deep Neural Networks are vulnerable toadversarial exam-ples(Figure 1...
An Axial Shifted MLP architecture (AS-MLP) is proposed in this paper.
Di...
Space-time video super-resolution (STVSR) aims to increase the spatial a...
While self-supervised representation learning (SSL) has received widespr...
Towards better unsupervised domain adaptation (UDA). Recently, researche...
Text-based image retrieval has seen considerable progress in recent year...
Conventional semi-supervised learning (SSL) methods, e.g., MixMatch, ach...
Text-based person search aims at retrieving target person in an image ga...
Current training objectives of existing person Re-IDentification (ReID)
...
Human-annotated labels are often prone to noise, and the presence of suc...
Pruning has become a very powerful and effective technique to compress a...
Self-supervised learning has shown great potentials in improving the vid...
One significant factor we expect the video representation learning to
ca...
Although Person Re-Identification has made impressive progress, difficul...
In the conventional person Re-ID setting, it is widely assumed that crop...
Greedy-NMS inherently raises a dilemma, where a lower NMS threshold will...
With a fixed model structure, knowledge distillation and filter grafting...
This paper proposes a new learning paradigm called filter grafting, whic...
Person re-identification (re-ID), is a challenging task due to the high
...
Although great progress in supervised person re-identification (Re-ID) h...
Recently, the research interest of person re-identification (ReID) has
g...
We consider the problem of high-dimensional light field reconstruction a...
Most existing Re-IDentification (Re-ID) methods are highly dependent on
...
This technical report proves components consistency for the Doubly Stoch...