Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling
Given a non-negative n × m real matrix A, the matrix scaling problem is to determine if it is possible to scale the rows and columns so that each row and each column sums to a specified target value for it. This problem arises in many algorithmic applications, perhaps most notably as a preconditioning step in solving a linear system of equations. One of the most natural and by now classical approach to matrix scaling is the Sinkhorn-Knopp algorithm (also known as the RAS method) where one alternately scales either all rows or all columns to meet the target values. In addition to being extremely simple and natural, another appeal of this procedure is that it easily lends itself to parallelization. A central question is to understand the rate of convergence of the Sinkhorn-Knopp algorithm. In this paper, we present an elementary convergence analysis for the Sinkhorn-Knopp algorithm that improves upon the previous best bound. In a nutshell, our approach is to show a simple bound on the number of iterations needed so that the KL-divergence between the current row-sums and the target row-sums drops below a specified threshold δ, and then connect the KL-divergence with ℓ_1 and ℓ_2 distances. For ℓ_1, we can use Pinsker's inequality. For ℓ_2, we develop a strengthening of Pinsker's inequality, called (KL vs ℓ_1/ℓ_2) in the paper, which lower bounds the KL-divergence by a combination of ℓ_1 and ℓ_2 distance. This inequality may be of independent interest. The idea of studying Sinkhorn-Knopp convergence via KL-divergence is not new and has indeed been previously explored. Our contribution is an elementary, self-contained presentation of this approach and an interesting new inequality that yields a significantly stronger convergence guarantee for the extensively studied ℓ_2-error.
READ FULL TEXT