Communication-reduction techniques are a popular way to improve scalabil...
Data-parallel distributed training of deep neural networks (DNN) has gai...
The ability to scale out training workloads has been one of the key
perf...
Many communication-efficient variants of SGD use gradient quantization
s...
The population model is a standard way to represent large-scale decentra...