Xiangyu Zhang
Researcher at Megvii
A new trend in the computer vision community is to capture objects of
in...
Masked image modeling (MIM) has become a prevalent pre-training setup fo...
Binary similarity analysis determines if two binary executables are from...
Due to the competitive environment, mobile apps are usually produced und...
Deep learning has been widely adopted to tackle various code-based tasks...
Recently 3D object detection from surround-view images has made notable
...
This paper presents a module, Spatial Cross-scale Convolution (SCSC), wh...
In many recommender problems, a handful of popular items (e.g. movies/TV...
Backdoor attacks have emerged as a prominent threat to natural language
...
Referring video object segmentation (RVOS) aims at segmenting an object ...
In multi-timescale multi-agent reinforcement learning (MARL), agents int...
Software specifications are essential for ensuring the reliability of
so...
Decompilation aims to recover the source code form of a binary executabl...
Reusing off-the-shelf code snippets from online repositories is a common...
Although end-to-end multi-object trackers like MOTR enjoy the merits of
...
Reverse engineering of protocol message formats is critical for many sec...
Inferring protocol formats is critical for many security applications.
H...
Multi-sensor fusion (MSF) is widely adopted for perception in autonomous...
We present view-synthesis autoencoders (VSA) in this paper, which is a
s...
DETR has set up a simple end-to-end pipeline for object detection by
for...
Self-supervised learning in computer vision trains on unlabeled data, su...
In this paper, we propose a long-sequence modeling framework, named
Stre...
3D object detectors usually rely on hand-crafted proxies, e.g., anchors ...
Due to the development of pre-trained language models, automated code
ge...
Existing referring understanding tasks tend to involve the detection of ...
Mainstream 3D representation learning approaches are built upon contrast...
Automated Program Repair (APR) improves software reliability by generati...
We develop a novel statistical approach to identify emission features or...
Monocular Depth Estimation (MDE) is a critical component in applications...
Deep Learning backdoor attacks have a threat model similar to traditiona...
In this paper, we propose a robust 3D detector, named Cross Modal Transf...
A recent study has shown a phenomenon called neural collapse in that the...
We propose a new neural network design paradigm Reversible Column Networ...
We conduct a systematic study of backdoor vulnerabilities in normally tr...
Extremely large-scale massive MIMO (XL-MIMO) has been reviewed as a prom...
Purpose: Digital twins are virtual interactive models of the real world,...
This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) v...
In this paper, we propose MOTRv2, a simple yet effective pipeline to
boo...
We present our 1st place solution to the Group Dance Multiple People Tra...
Federated Learning (FL) is a distributed learning paradigm that enables
...
Unit type errors, where values with physical unit types (e.g., meters, h...
Large-scale language models are trained on a massive amount of natural
l...
Lyrics recognition is an important task in music processing. Despite
tra...
Differentiable architecture search (DARTS) has significantly promoted th...
Recently, Masked Image Modeling (MIM) achieves great success in
self-sup...
We focus on better understanding the critical factors of
augmentation-in...
Deep learning has substantially boosted the performance of Monocular Dep...
Videos are prone to tampering attacks that alter the meaning and deceive...
Recent advances in 2D CNNs and vision transformers (ViTs) reveal that la...
Pervasive backdoors are triggered by dynamic and pervasive input
perturb...