Byzantine Stochastic Gradient Descent

03/23/2018
by   Dan Alistarh, et al.
0

This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the m machines which allegedly compute stochastic gradients every iteration, an α-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds ε-approximate minimizers of convex functions in T = Õ( 1/ε^2 m + α^2/ε^2) iterations. In contrast, traditional mini-batch SGD needs T = O( 1/ε^2 m) iterations, but cannot tolerate Byzantine failures. Further, we provide a lower bound showing that, up to logarithmic factors, our algorithm is information-theoretically optimal both in terms of sampling complexity and time complexity.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset