This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tu...
Existing methods of multiple human parsing (MHP) apply statistical model...
Out-of-distribution (OOD) detection aims to detect "unknown" data whose
...
Scene graph generation aims to detect visual relationship triplets, (sub...
The task of dynamic scene graph generation (DynSGG) aims to generate sce...
Existing Unbiased Scene Graph Generation (USGG) methods only focus on
ad...
Current Scene Graph Generation (SGG) methods explore contextual informat...
Test-time task adaptation in few-shot learning aims to adapt a pre-train...
Due to the gap between a substitute model and a victim model, the
gradie...
Few-shot classification consists of a training phase where a model is le...
Generating consecutive descriptions for videos, i.e., Video Captioning,
...
Studies of image captioning are shifting towards a trend of a fully
end-...
As a crucial approach for compact representation learning, hashing has
a...
Unrestricted color attacks, which manipulate semantically meaningful col...
Existing methods of multiple human parsing usually adopt a two-stage str...
Scene graph generation (SGG) is a fundamental task aimed at detecting vi...
The current studies of Scene Graph Generation (SGG) focus on solving the...
For black-box attacks, the gap between the substitute model and the vict...
The performance of current Scene Graph Generation (SGG) models is severe...
Skeleton-based action recognition aims to project skeleton sequences to
...
Scene Graph Generation (SGG) represents objects and their interactions w...
Part-level attribute parsing is a fundamental but challenging task, whic...
Human densepose estimation, aiming at establishing dense correspondences...
Recently, attention-based Visual Question Answering (VQA) has achieved g...
To date, visual question answering (VQA) (i.e., image QA and video QA) i...
Video captioning is a challenging task that necessitates a thorough
comp...
The performance of current Scene Graph Generation models is severely ham...
Modeling latent variables with priors and hyperpriors is an essential pr...
Defense models against adversarial attacks have grown significantly, but...
In recent years, the adversarial vulnerability of deep neural networks (...
Automatically describing videos with natural language is a fundamental
c...
Scene graph generation (SGG) is built on top of detected objects to pred...
As a structured representation of the image content, the visual scene gr...
Part-level Action Parsing aims at part state parsing for boosting action...
Adversarial attacks make their success in fooling DNNs and among
them, g...
Due to the vulnerability of deep neural networks (DNNs) to adversarial
e...
The scene graph generation (SGG) task aims to detect visual relationship...
Abundant real-world data can be naturally represented by large-scale
net...
Learning accurate low-dimensional embeddings for a network is a crucial ...
Scene graphs provide valuable information to many downstream tasks. Many...
Human-Object Interaction (HOI) detection is a fundamental visual task ai...
By adding human-imperceptible perturbations to images, DNNs can be easil...
Crafting adversarial examples for the transfer-based attack is challengi...
Although great progress has been made on adversarial attacks for deep ne...
By adding human-imperceptible noise to clean images, the resultant
adver...
Despite the huge progress in scene graph generation in recent years, its...
In this paper, we consider a novel task, Spatio-Temporal Video Grounding...
The data storage has been one of the bottlenecks in surveillance systems...
With the recent explosive increase of digital data, image recognition an...
Quantization has been an effective technology in ANN (approximate neares...