Low-Synch Gram-Schmidt with Delayed Reorthogonalization for Krylov Solvers
The parallel strong-scaling of Krylov iterative methods is largely determined by the number of global reductions required at each iteration. The GMRES and Krylov-Schur algorithms employ the Arnoldi algorithm for nonsymmetric matrices. The underlying orthogonalization scheme is left-looking and processes one column at a time. Thus, at least one global reduction is required per iteration. The traditional algorithm for generating the orthogonal Krylov basis vectors for the Krylov-Schur algorithm is classical Gram Schmidt applied twice with reorthogonalization (CGS2), requiring three global reductions per step. A new variant of CGS2 that requires only one reduction per iteration is applied to the Arnoldi-QR iteration. Strong-scaling results are presented for finding eigenvalue-pairs of nonsymmetric matrices. A preliminary attempt to derive a similar algorithm (one reduction per Arnoldi iteration with a robust orthogonalization scheme) was presented by Hernandez et al.(2007). Unlike our approach, their method is not forward stable for eigenvalues.
READ FULL TEXT