Making Byzantine Decentralized Learning Efficient

09/22/2022
by   Sadegh Farhadkhani, et al.
0

Decentralized-SGD (D-SGD) distributes heavy learning tasks across multiple machines (a.k.a., nodes), effectively dividing the workload per node by the size of the system. However, a handful of Byzantine (i.e., misbehaving) nodes can jeopardize the entire learning procedure. This vulnerability is further amplified when the system is asynchronous. Although approaches that confer Byzantine resilience to D-SGD have been proposed, these significantly impact the efficiency of the process to the point of even negating the benefit of decentralization. This naturally raises the question: can decentralized learning simultaneously enjoy Byzantine resilience and reduced workload per node? We answer positively by proposing that ensures Byzantine resilience without losing the computational efficiency of D-SGD. Essentially, weakens the impact of Byzantine nodes by reducing the variance in local updates using Polyak's momentum. Then, by establishing coordination between nodes via signed echo broadcast and a nearest-neighbor averaging scheme, we effectively tolerate Byzantine nodes whilst distributing the overhead amongst the non-Byzantine nodes. To demonstrate the correctness of our algorithm, we introduce and analyze a novel Lyapunov function that accounts for the non-Markovian model drift arising from the use of momentum. We also demonstrate the efficiency of through experiments on several image classification tasks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset