HS3: Learning with Proper Task Complexity in Hierarchically Supervised Semantic Segmentation
While deeply supervised networks are common in recent literature, they typically impose the same learning objective on all transitional layers despite their varying representation powers. In this paper, we propose Hierarchically Supervised Semantic Segmentation (HS3), a training scheme that supervises intermediate layers in a segmentation network to learn meaningful representations by varying task complexity. To enforce a consistent performance vs. complexity trade-off throughout the network, we derive various sets of class clusters to supervise each transitional layer of the network. Furthermore, we devise a fusion framework, HS3-Fuse, to aggregate the hierarchical features generated by these layers, which can provide rich semantic contexts and further enhance the final segmentation. Extensive experiments show that our proposed HS3 scheme considerably outperforms vanilla deep supervision with no added inference cost. Our proposed HS3-Fuse framework further improves segmentation predictions and achieves state-of-the-art results on two large segmentation benchmarks: NYUD-v2 and Cityscapes.
READ FULL TEXT