research
∙
07/16/2023
Accelerating Distributed ML Training via Selective Synchronization
In distributed training, deep neural networks (DNNs) are launched over m...
research
∙
05/20/2023
Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching
Current techniques and systems for distributed model training mostly ass...
research
∙
05/20/2023
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
Distributed data-parallel (DDP) training improves overall application th...
research
∙
03/12/2023
Scavenger: A Cloud Service for Optimizing Cost and Performance of ML Training
While the pay-as-you-go nature of cloud virtual machines (VMs) makes it ...
research
∙
01/21/2023