WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

11/25/2022

∙

This paper presents WALDO (WArping Layer-Decomposed Objects), a novel approach to the prediction of future video frames from past ones. Individual images are decomposed into multiple layers combining object masks and a small set of control points. The layer structure is shared across all frames in each video to build dense inter-frame connections. Complex scene motions are modeled by combining parametric geometric transformations associated with individual layers, and video synthesis is broken down into discovering the layers associated with past frames, predicting the corresponding transformations for upcoming ones and warping the associated object regions accordingly, and filling in the remaining image parts. Extensive experiments on the Cityscapes (resp. KITTI) dataset show that WALDO significantly outperforms prior works with, e.g., 3, 27, and 51 LPIPS and FVD metrics. Code, pretrained models, and video samples synthesized by our approach can be found in the project webpage https://16lemoing.github.io/waldo.

READ FULL TEXT

WALDO: Future Video Synthesis using Object Layer Decomposition and Parametric Flow Prediction

Sign in with Google

Consider DeepAI Pro