On the Convergence of FedAvg on Non-IID Data

07/04/2019
by   Xiang Li, et al.
0

Federated learning enables a large amount of edge computing devices to learn a centralized model while keeping all local data on edge devices. As a leading algorithm in this setting, Federated Averaging (FedAvg) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees in the federated setting. In this paper, we analyze the convergence of FedAvg on non-iid data. We investigate the effect of different sampling and averaging schemes, which are crucial especially when data are unbalanced. We prove a concise convergence rate of O(1/T) for FedAvg with proper sampling and averaging schemes in convex problems, where T is the total number of steps. Our results show that heterogeneity of data slows down the convergence, which is intrinsic in the federated setting. Low device participation rate can be achieved without severely harming the optimization process in federated learning. We show that there is a trade-off between communication efficiency and convergence rate. We analyze the necessity of learning rate decay by taking a linear regression as an example. Our work serves as a guideline for algorithm design in applications of federated learning, where heterogeneity and unbalance of data are the common case.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset