End-to-end GPU acceleration of low-order-refined preconditioning for high-order finite element discretizations
In this paper, we present algorithms and implementations for the end-to-end GPU acceleration of matrix-free low-order-refined preconditioning of high-order finite element problems. The methods described here allow for the construction of effective preconditioners for high-order problems with optimal memory usage and computational complexity. The preconditioners are based on the construction of a spectrally equivalent low-order discretization on a refined mesh, which is then amenable to, for example, algebraic multigrid preconditioning. The constants of equivalence are independent of mesh size and polynomial degree. For vector finite element problems in H( curl) and H( div) (e.g. for electromagnetic or radiation diffusion problems) a specially constructed interpolation-histopolation basis is used to ensure fast convergence. Detailed performance studies are carried out to analyze the efficiency of the GPU algorithms. The kernel throughput of each of the main algorithmic components is measured, and the strong and weak parallel scalability of the methods is demonstrated. The different relative weighting and significance of the algorithmic components on GPUs and CPUs is discussed. Results on problems involving adaptively refined nonconforming meshes are shown, and the use of the preconditioners on a large-scale magnetic diffusion problem using all spaces of the finite element de Rham complex is illustrated.
READ FULL TEXT