Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

05/26/2022
by   Bruno Sauvalle, et al.
0

We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses an attention mechanism to associate a feature vector to each object present in the scene and to predict the coordinates of these objects using soft-argmax. A transformer encoder handles occlusions and redundant detections, and a separate pre-trained background model is in charge of background reconstruction. We show that this architecture significantly outperforms the state of the art on complex synthetic benchmarks and provide examples of applications to real-world traffic videos.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset