Network Aware Compute and Memory Allocation in Optically Composable Data Centres with Deep Reinforcement Learning and Graph Neural Networks
Resource-disaggregated data centre architectures promise a means of pooling resources remotely within data centres, allowing for both more flexibility and resource efficiency underlying the increasingly important infrastructure-as-a-service business. This can be accomplished by means of using an optically circuit switched backbone in the data centre network (DCN); providing the required bandwidth and latency guarantees to ensure reliable performance when applications are run across non-local resource pools. However, resource allocation in this scenario requires both server-level and network-level resource to be co-allocated to requests. The online nature and underlying combinatorial complexity of this problem, alongside the typical scale of DCN topologies, makes exact solutions impossible and heuristic based solutions sub-optimal or non-intuitive to design. We demonstrate that deep reinforcement learning, where the policy is modelled by a graph neural network can be used to learn effective network-aware and topologically-scalable allocation policies end-to-end. Compared to state-of-the-art heuristics for network-aware resource allocation, the method achieves up to 20% higher acceptance ratio; can achieve the same acceptance ratio as the best performing heuristic with 3× less networking resources available and can maintain all-around performance when directly applied (with no further training) to DCN topologies with 10^2× more servers than the topologies seen during training.
READ FULL TEXT