Management and Orchestration of Virtual Network Functions via Deep Reinforcement Learning
Management and orchestration (MANO) of resources by virtual network functions (VNFs) represents one of the key challenges towards a fully virtualized network architecture as envisaged by 5G standards. Current threshold-based policies inefficiently over-provision network resources and under-utilize available hardware, incurring high cost for network operators, and consequently, the users. In this work, we present a MANO algorithm for VNFs allowing a central unit (CU) to learn to autonomously re-configure resources (processing power and storage), deploy new VNF instances, or offload them to the cloud, depending on the network conditions, available pool of resources, and the VNF requirements, with the goal of minimizing a cost function that takes into account the economical cost as well as latency and the quality-of-service (QoS) experienced by the users. First, we formulate the stochastic resource optimization problem as a parameterized action Markov decision process (PAMDP). Then, we propose a solution based on deep reinforcement learning (DRL). More precisely, we present a novel RL approach called, parameterized action twin (PAT) deterministic policy gradient, which leverages an actor-critic architecture to learn to provision resources to the VNFs in an online manner. Finally, we present numerical performance results, and map them to 5G key performance indicators (KPIs). To the best of our knowledge, this is the first work that considers DRL for MANO of VNFs' physical resources.
READ FULL TEXT