Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation
Holistically understanding an object and its 3D movable parts through visual perception models is essential for enabling an autonomous agent to interact with the world. For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensure the safety of the self-driving vehicle. Existing visual perception models mainly focus on coarse parsing such as object bounding box detection or pose estimation and rarely tackle these situations. In this paper, we address this important problem for autonomous driving by solving two critical issues using visual data augmentation. First, to deal with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images and then reconstructing human-vehicle interaction scenarios. This allows us to directly edit the real images using the aligned 3D parts, yielding effective training data generation for learning robust deep neural networks (DNNs). Second, to benchmark the quality of 3D part understanding, we collect a large dataset in real world driving scenarios with vehicles in uncommon states (VUS), i.e. with the door or trunk opened, etc. Experiments demonstrate our trained network with visual data augmentation largely outperforms other baselines in terms of 2D detection and instance segmentation accuracy. Our network yields large improvements in discovering and understanding these uncommon cases. Moreover, we plan to release all of the source code, the dataset, and the trained model on GitHub.
READ FULL TEXT