Using Mixed Precision in Low-Synchronization Reorthogonalized Block Classical Gram-Schmidt
Using lower precision in algorithms can be beneficial in terms of reducing both computation and communication costs. Motivated by this, we aim to further the state-of-the-art in developing and analyzing mixed precision variants of iterative methods. In this work, we focus on the block variant of low-synchronization classical Gram-Schmidt with reorthogonalization, which we call BCGSI+LS. We demonstrate that the loss of orthogonality produced by this orthogonalization scheme can exceed O(u)κ(𝒳), where u is the unit roundoff and κ(𝒳) is the condition number of the matrix to be orthogonalized, and thus we can not in general expect this to result in a backward stable block GMRES implementation. We then develop a mixed precision variant of this algorithm, called BCGSI+LS-MP, which uses higher precision in certain parts of the computation. We demonstrate experimentally that for a number of challenging test problems, our mixed precision variant successfully maintains a loss of orthogonality below O(u)κ(𝒳). This indicates that we can achieve a backward stable block GMRES algorithm that requires only one synchronization per iteration.
READ FULL TEXT