Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance
Image outpainting technology generates visually reasonable content regardless of authenticity, making it unreliable to serve for practical applications even though introducing additional modalities eg. the sketch. Since sparse depth maps are widely captured in robotics and autonomous systems, together with RGB images, we combine the sparse depth in the image outpainting task to provide more reliable performance. Concretely, we propose a Depth-Guided Outpainting Network (DGONet) to model the feature representations of different modalities differentially and learn the structure-aware cross-modal fusion. To this end, two components are designed to implement: 1) The Multimodal Learning Module produces unique depth and RGB feature representations from the perspectives of different modal characteristics. 2) The Depth Guidance Fusion Module leverages the complete depth modality to guide the establishment of RGB contents by progressive multimodal feature fusion. Furthermore, we specially design an additional constraint strategy consisting of Cross-modal Loss and Edge Loss to enhance ambiguous contours and expedite reliable content generation. Extensive experiments on KITTI demonstrate our superiority over the state-of-the-art methods with more reliable content generation.
READ FULL TEXT