DINGO: Distributed Newton-Type Method for Gradient-Norm Optimization
For optimization of a sum of functions in a distributed computing environment, we present a novel communication efficient Newton-type algorithm that enjoys a variety of advantages over similar existing methods. Similar to Newton-MR, our algorithm, DINGO, is derived by optimization of the gradient's norm as a surrogate function. DINGO does not impose any specific form on the underlying functions, and its application range extends far beyond convexity. In addition, the distribution of the data across the computing environment can be almost arbitrary. Further, the underlying sub-problems of DINGO are simple linear least-squares, for which a plethora of efficient algorithms exist. Lastly, DINGO involves a few hyper-parameters that are easy to tune. Moreover, we theoretically show that DINGO is not sensitive to the choice of its hyper-parameters in that a strict reduction in the gradient norm is guaranteed, regardless of the selected hyper-parameters. We demonstrate empirical evidence of the effectiveness, stability and versatility of our method compared to other relevant algorithms, on both convex and non-convex problems.
READ FULL TEXT