Productive Performance Engineering for Weather and Climate Modeling with Python
Earth system models are developed with a tight coupling to target hardware, often containing highly-specialized code predicated on processor characteristics. This coupling stems from using imperative languages that hard-code computation schedules and layout. In this work, we present a detailed account of optimizing the Finite Volume Cubed-Sphere (FV3) weather model, improving productivity and performance. By using a declarative Python-embedded stencil DSL and data-centric optimization, we abstract hardware-specific details and define a semi-automated workflow for analyzing and optimizing weather and climate applications. The workflow utilizes both local optimization and full-program optimization, as well as user-guided fine-tuning. To prune the infeasible global optimization space, we automatically utilize repeating code motifs via a novel transfer tuning approach. On the Piz Daint supercomputer, we achieve speedups of up to 3.92x using GPUs over the tuned production implementation at a fraction of the original code.
READ FULL TEXT