Zero-touch Continuous Network Slicing Control via Scalable Actor-Critic Learning
Artificial intelligence (AI)-driven zero-touch network slicing is envisaged as a promising cutting-edge technology to harness the full potential of heterogeneous 5G and beyond 5G (B5G) communication systems and enable the automation of demand-aware resource management and orchestration (MANO). In this paper, we tackle the issue of B5G radio access network (RAN) joint slice admission control and resource allocation according to proposed slice-enabling cell-free massive multiple-input multiple-output (mMIMO) setup by invoking a continuous deep reinforcement learning (DRL) method. We present a novel Actor-Critic-based network slicing approach called, prioritized twin delayed distributional deep deterministic policy gradient (D-TD3). The paper defines and corroborates via extensive experimental results a zero-touch network slicing scheme with a multi-objective approach where the central server learns continuously to accumulate the knowledge learned in the past to solve future problems and re-configure computing resources autonomously while minimizing latency, energy consumption, and virtual network function (VNF) instantiation cost for each slice. Moreover, we pursue a state-action return distribution learning approach with the proposed replay policy and reward-penalty mechanisms. Finally, we present numerical results to showcase the gain of the adopted multi-objective strategy and verify the performance in terms of achieved slice admission rate, latency, energy, CPU utilization, and time efficiency.
READ FULL TEXT