Membership (membership query / membership testing) is a fundamental prob...
We propose Dirigo, a distributed stream processing service built atop vi...
We present CASSINI, a network-aware job scheduler for machine learning (...
State-of-the-art congestion control algorithms for data centers alone do...
Computer systems use many scheduling heuristics to allocate resources.
U...
Federated learning (FL) is an emerging machine learning (ML) paradigm th...
Dynamic adaptation has become an essential technique in accelerating
dis...
RDMA over Converged Ethernet (RoCE) has gained significant attraction fo...
Machine learning is rapidly being used in database research to improve t...
Model aggregation, the process that updates model parameters, is an impo...
Many organizations employ compute clusters equipped with accelerators su...
Deep Neural Networks (DNNs) are witnessing increased adoption in multipl...
High performance rack-scale offerings package disaggregated pools of com...
Over the last few years, Deep Neural Networks (DNNs) have become ubiquit...
In networks today, the data plane handles forwarding—sending a packet to...
The increased use of micro-services to build web applications has spurre...
It is increasingly common to outsource network functions (NFs) to the cl...
Modern distributed machine learning (ML) training workloads benefit
sign...
Existing distributed machine learning (DML) systems focus on improving t...
Today's distributed network control planes support multiple routing
prot...
We present a scheduler that improves cluster utilization and job complet...