Fine-grained visual classification (FGVC) involves categorizing fine
sub...
Non-photorealistic videos are in demand with the wave of the metaverse, ...
GAN inversion is indispensable for applying the powerful editability of ...
Domain adaptation is commonly employed in crowd counting to bridge the d...
Multi-view (or -modality) representation learning aims to understand the...
Synthesizing novel views from a single view image is a highly ill-posed
...
The presence of non-homogeneous haze can cause scene blurring, color
dis...
Considering the ill-posed nature, contrastive regularization has been
de...
Previous Knowledge Distillation based efficient image retrieval methods
...
HD map reconstruction is crucial for autonomous driving. LiDAR-based met...
We resolve the ill-posed alpha matting problem from a completely differe...
Despite the demonstrated editing capacity in the latent space of a pretr...
Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component f...
Crowd image is arguably one of the most laborious data to annotate. In t...
We present a novel high-resolution face swapping method using the inhere...
Recent Vision Transformer (ViT) models have demonstrated encouraging res...
Ultra-high resolution image segmentation has raised increasing interests...
Existing domain adaptation methods for crowd counting view each crowd im...
Labeling is onerous for crowd counting as it should annotate each indivi...
The fully convolutional network (FCN) has dominated salient object detec...
Existing GAN inversion methods are stuck in a paradox that the inverted ...
Transformers recently are adapted from the community of natural language...
Image matting is an ill-posed problem that usually requires additional u...
This paper proposes a novel pretext task to address the self-supervised ...
In this paper, we propose a simple yet effective approach, named Triple
...
In recent years, vision-based crowd analysis has been studied extensivel...
Temporal repetition counting aims to estimate the number of cycles of a ...
While deep neural networks have been shown to perform remarkably well in...
In this paper, we propose a novel iterative multi-task framework to comp...
We address the problem of video representation learning without
human-an...
We address the problem of restoring a high-resolution face image from a
...
The tracking-by-detection framework receives growing attentions through ...
Vision-based vehicle detection approaches achieve incredible success in
...
Egocentric videos, which mainly record the activities carried out by the...
We address the problem of transferring the style of a headshot photo to ...
We propose a two-stage method for face hallucination. First, we generate...
Numerous efforts have been made to design different low level saliency c...