Faster Convergence of Local SGD for Over-Parameterized Models

01/30/2022
by   Tiancheng Qin, et al.
0

Modern machine learning architectures are often highly expressive. They are usually over-parameterized and can interpolate the data by driving the empirical loss close to zero. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized models in the heterogeneous data setting and improve upon the existing literature by establishing the following convergence rates. We show an error bound of Ø(exp(-T)) for strongly-convex loss functions, where T is the total number of iterations. For general convex loss functions, we establish an error bound of Ø(1/T) under a mild data similarity assumption and an error bound of Ø(K/T) otherwise, where K is the number of local steps. We also extend our results for non-convex loss functions by proving an error bound of Ø(K/T). Before our work, the best-known convergence rate for strongly-convex loss functions was Ø(exp(-T/K)), and none existed for general convex or non-convex loss functions under the overparameterized setting. We complete our results by providing problem instances in which such convergence rates are tight to a constant factor under a reasonably small stepsize scheme. Finally, we validate our theoretical results using numerical experiments on real and synthetic data.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset