Cognitive Agents for Self-optimizing NFV Deployments
Toward Autonomous Resource Management in Modern Telco Networks
October 2021 Nikos Anastopoulos,
Senior Product Manager & Solutions Architect, Telco & Enterprise Software

Loukas Gravias,
Data Scientist, Telco & Enterprise Software

Nikos Tsantanis,
Senior Product Marketing Manager, Telco & Enterprise Software


5G networks are becoming a reality across the globe, giving rise to new business models and innovative applications across several industries. This next generation of cellular networks is not simply an upgrade from previous ones but rather a structural evolution of contemporary telco networks, underpinned by emerging technologies.

One of these fundamental technologies is Network Functions Virtualization (NFV) which helps CSPs reduce the total cost of ownership and increase their service scalability and agility. These benefits are achieved by allowing functions from the 5G mobile core and radio to run as lightweight, cloud-native software on standardized commodity hardware.

Despite its advantages, NFV is confronted with a challenge that can easily break its promise for greater flexibility and cost efficiency. Making optimal use of compute resources, in line with the runtime demand of network functions, is a very hard task which becomes harder as mobile networks expand from the core to the edge. Evidently, efficient resource management is a key prerequisite for NFV to deliver on its promises.

a note saying Autonomous resource management at scale demands advanced cognitive agents able to learn by themselves in an initially unknown environment
The Need for Autonomous Resource Management

One of the key industry priorities, since the early days of NFV was, and still remains, to guarantee carrier grade performance for certain Virtual Network Functions (VNFs). In many cases this leads to hardware overprovisioning. For example, a common practice of CSPs is to deploy performance critical VNFs in isolation, reserving upfront a large portion of server resources for their exclusive use. In this way, the VNFs do have sufficient resources even for demand peaks, and are protected from contention with other workloads that could affect their Service Level Objectives (SLOs). The drawback, however, is that a significant amount of server resources remains unutilized, resulting thus in considerable OpEx and CapEx attributed to the NFV infrastructure.

To make things even more complicated, edge servers are expected to consolidate an unusually disparate mixture of mobile core functions, RAN components, over-the-top services and new types of applications requiring proximity to end users. This increase in software diversity and density will introduce uncertainty on how workloads on a single server will interfere with each other which, in turn, gives rise to the challenge of optimally distributing shared compute resources to meet SLOs with minimal resource footprint.

In this respect, CSPs can no longer depend on conventional approaches to manage resources efficiently, in real time and at scale. Techniques from rule based automation to decision making based on analytics and historical data, would be inadequate as they would fail to capture the highly dynamic nature of 5G deployments. Preferably, an ideal solution must be able:

  • To handle multiple and diverse colocated services that have unpredictable interactions, independent load variations and, possibly, differentiated priorities.
  • To make resource allocation decisions accurately ("no more resources than needed for every service") and timely ("before SLA violations occur").
  • To dynamically adjust decisions in line with traffic load changes.

Such qualities suggest not simply an intelligent resource management solution but rather one that is able to autonomously learn and adapt to its environment.

Why Reinforcement Learning?

Such an autonomous resource management solution demands advanced AI-powered cognitive agents, able to learn by themselves to make optimal resource decisions in a live telco network while starting from zero prior knowledge. Learned knowledge can potentially be then disseminated among agents across edge locations, to accelerate optimal decision making all over the entire network.

The only technology that, currently, meets the ambitious objectives of such requirements is Reinforcement Learning (RL). In RL, an agent gets trained in a way that rewards desired behaviors and/or punishes undesired ones. RL has two distinct phases: Training (or exploration) phase, and Inference (or exploitation) phase. During training the agent takes random actions in its environment, exploring which will lead to bigger reward. During inference the agent has already learned what the best actions in each different state are and leverages this knowledge to maximize the reward gathered. Recent advances in Deep Learning have led to a new approach, namely Deep RL, which leverages the power of Deep Neural Networks to tackle much more complex environments than "pure" RL method.

image of reinforcement learning example

In the NFV context, RL can be used to determine the ideal distribution of shared resources among VNFs colocated on a server, so that:

  • SLOs of individual VNFs are always met, even when incoming traffic changes dynamically.
  • The amount of resources allocated to each VNF is the smallest possible, to prevent unnecessary resource usage and operational expenses.

Adopting RL in NFV signifies a promising step forward over other approaches for the following reasons:

  • The server resources on which a decision must be made can be many and may have tricky interdependencies.
  • The colocated VNFs for which a decision must be made can be many, as well, and may have conflicting resource requirements.
  • A resource decision cannot be one off; it should be revised whenever a VNF’s traffic changes.

These factors combined, lead to an explosion of the decision space to extreme numbers of state action pairs. On top of that, decisions must be taken in real time. These two imperatives make Deep RL the only realistic solution.

RL Challenges in Production Systems

Despite its great potential, RL comes with certain challenges in production environments:

  1. Exploration is prohibitive

    RL must observe and act randomly upon a certain state many times in order to determine which action is the optimum for each individual state. As an indication, full optimization of an application may require as many as 50,000 observations. Trying such a large number of random actions in a live system could lead to serious service degradation and even failure.

  2. Training is very slow

    In a production server, taking samples of performance metrics at less than a second frequency is usually impractical and of questionable value. On the contrary, it is very common that samples are collected at multisecond granularity (e.g. 10 to 60 secs), which results in an RL agent needing between 6 days to a month (!) to get fully trained.

  3. Retraining is required if the environment changes

    If the environment, in which the RL agent is trained, changes then there is a strong possibility that the VNFs can no longer meet their SLOs. Such changes may involve: amount or type of hardware resources, application incoming load, or performance constraints set by the user. Because of these changes, the agent has to be retrained, which is a potentially very difficult task in the context of a large data center.

To overcome these challenges, Intracom Telecom created Self Learning Resource Allocation (SLRA), an advanced Deep RL system capable of optimizing any application.

Making RL Practical

SLRA introduces three key innovations:

  1. A heuristic mechanism able to narrow down the large set of random actions taken during training, to a much smaller subset of actions that look most promising for meeting VNF SLOs. This approach, applied before the actual RL training, not only shortens exploration space but also helps avoiding ‘destructive’ exploration by omitting actions that would very likely lead to SLO violations. Because of its moderated actions, the execution of this mechanism can easily take place on the production environment without much concern.
  2. The Digital Twin, a ‘digital clone’ of an application that models its behavior under different hardware resource allocations and traffic conditions. With a Digital Twin in place for every deployed application, the RL agent can be trained outside the production environment in a simulated environment, avoiding thus direct interaction with the actual application. As expected, the latency of an observe-and-act loop during training decreases by several orders of magnitude (e.g. from seconds to milliseconds), while overall training time is reduced by 150x-9,000x, depending on the baseline (frequency of metrics collection on the production environment). A Digital Twin is created using telemetry samples of the application, acquired from the production server under different resource allocations and traffic conditions. These samples can be collected either from historical production data or through the heuristic mechanism or from a staging environment when no SLO violations at all are tolerated in production.
  3. The SLO monitoring module, a feedback loop which addresses the burden of RL retraining in a fully automated fashion. After an RL agent is deployed to production, this module continuously checks for SLO violations. When one is detected for a certain application, a retraining process is automatically triggered, as follows: first the Digital Twin is recreated for this application leveraging the new production data and then the RL agent is retrained, against the new Digital Twin, in order to discover new optimal resource allocations that do not violate SLOs.

SLRA is a new feature of the NFV-RI™ solution and aspires to become the state-of-the-art paradigm for autonomous resource decisions in modern NFV deployments.

image presenting SLRA in action
Real-World 5G & EDGE use cases

The potential of Deep RL has been revealed through a series of use cases and workloads that are representative in modern telco networks. In those, SLRA managed to minimize the use of resources, under varying load conditions, without incurring any SLO violations.


  • In an in-memory, data intensive VNF accepting varying traffic, SLRA managed to dynamically shrink by a significant factor the amount of resources (last level cache, memory bandwidth) needed by the VNF to maintain its latency under a certain threshold. Depending on the traffic level, SLRA achieved a mere 10-40% utilization of the resources in the baseline case, freeing up the remaining capacity for other workloads to use.
  • In a dataplane VNF accepting varying traffic, SLRA managed to adjust its CPU power in line with traffic, without incurring SLO violations (packet drops). Compared to the baseline case, where the VNF was operating at maximum CPU frequency, server energy was reduced by 10-30%.

These results indicate that Deep RL has great potential in dynamically optimizing various types of applications and can be generalizable across resources of different types and semantics, rendering it a more effective alternative over existing approaches.

At a larger scale, Intracom Telecom evaluates SLRA in a PoC project of the ETSI Experiential Networked Intelligence Industry Specification Group. In the PoC, SLRA is challenged to fine tune resources for multiple colocated VNFs implementing differentiated 5G slices, in order to meet latency SLOs fully automatically and always in line with incoming traffic.

To find out more on the NFV Resource Intelligence (NFV-RI™) solution and the SLRA application, please visit