Pronto: Federated Task Scheduling
We present a federated, asynchronous, memory-limited algorithm for online task scheduling across large-scale networks of hundreds of workers. This is achieved through recent advancements in federated edge computing that unlocks the ability to incrementally compute local model updates within each node separately. This local model is then used along with incoming data to generate a rejection signal which reflects the overall node responsiveness and if it is able to accept an incoming task without resulting in degraded performance. Through this innovation, we allow each node to execute scheduling decisions on whether to accept an incoming job independently based on the workload seen thus far. Further, using the aggregate of the iterates a global view of the system can be constructed, as needed, and could be used to produce a holistic perspective of the system. We complement our findings, by an empirical evaluation on a large-scale real-world dataset of traces from a virtualized production data center that shows, while using limited memory, that our algorithm exhibits state-of-the-art performance. Concretely, it is able to predict changes in the system responsiveness ahead of time based on the industry-standard CPU-Ready metric and, in turn, can lead to better scheduling decisions and overall utilization of the available resources. Finally, in the absence of communication latency, it exhibits attractive horizontal scalability.
READ FULL TEXT