A modern 5G telecommunications network is a complex system of interdependencies across many software entities and technologies. During early phases of 5G deployments, service providers have been tempted to go with vertically integrated stacks to achieve reduced time-to-value. With the rapid evolution of services and technologies connected to 5G, these vertically integrated stacks have become obstacles for service provider competitiveness and their ability to quickly introduce new services. 

Further challenges arise from planned network maintenance windows that have been designed to support longer release life cycles, and from traditional network automation and orchestration tools not being fit for the 5G era. To better understand the scale of these challenges faced by service providers, it is necessary to outline certain scalability factors associated with 5G networks.

Network coverage and capacity requirements

The deployed frequency band influences the available bandwidth and coverage of a service. While a low-band frequency with a bandwidth of a few Kbps can cater up to 40 km of coverage, a high-band frequency with a bandwidth of 10 Gbps can cater up to 300 meters. To achieve similar coverage to the low-band frequency requires a denser deployment, in the region of tens of thousands of cells. In fact, CTIA has estimated the US service providers will need more than 800,000 cells by 2026. This will have a profound impact on the lifecycle operations of service provider networks.

Using a small 10,000 cell deployment as an example, and following a typical four hour service provider maintenance window,  a network of this size will require over 40,000 hours (~4.5 years) to complete a single upgrade of all its locations if done sequentially.  A typical service provider approach for maintenance windows makes this even more challenging as procedure dictates maintenance can only be scheduled during low traffic time intervals that normally map to a time between 1 am to 5 am. This effectively equates to one maintenance window interval per night.

Under these circumstances, if service providers wish to achieve a full upgrade within a year (using all available days of the year) it would mean the successful upgrade of approximately 28 locations per night. It is important to note this example is a small deployment, and service providers have various holidays or special weeks during a year where no change is allowed in the network.

Network and system complexities

Numerous components, processes and stakeholders that constitute and are involved in a network must also be considered: 

  • The radio access network (RAN) elements that include the distributed unit (DU) and centralized unit (CU)
  • The multiple cloud-native network functions (CNFs) of a 5G Core network
  • The integrations and dependencies with services from previous generations
  • Multiple vendors for RAN and 5G Core services, each with their own release cycles

These give an insight into the complexity of the system and why service provider operations teams can be overwhelmed. 

A small 10,000 cell deployment example demonstrates the shortcomings of a typical service provider approach to the maintenance and operation of a telecommunication network. The end result is a complicated operational mode. Complexity increases the possibility of failure scenarios which leads to immobilism. Immobilism leads to rapid growth of technical debt of the system which multiplies failure scenarios that ultimately creates a vicious circle. A paradigm shift  of operating networks using models designed to scale, adapt and maintain continuous operation are needed.

Creating an autonomous network

Key building blocks of these new models can be found in the patterns and practices from autonomous networks (AN). Patterns of ANs for service providers have the same core elements of artificial intelligence for IT operations (AIOps) for enterprise IT. A core requirement is the aggregation of metrics, logs and events in a way where machine learning (ML) can learn and discover patterns, and for an AI component to act autonomously.

Collection and processing of data is typically centralized within enterprise IT environments. This is not a valid option within service provider networks due to the sheer volume of data. Even if the data could be moved to a centralized location, the time taken to move the data would render it useless. To address this,  service providers perform regional or localized aggregation of data, and the AIOps systems are run in a decentralized manner.

When metrics, logs and events can be consumed by ML and AI, service providers can start moving towards an autonomous operation.

TMForum IG1218 describes functional aspects that should be fully automated to achieve a level 5 autonomous network:

  • Execution: automatic execution of provisioning, self-healing and operations tasks
  • Awareness: the ability of the system to sense real-time environment changes
  • Analysis: root cause and fault analysis to drive self-healing
  • Decision: closed-loop automation
  • Intent: declarative, machine processable goals, targets, requirements and constraints

These functional aspects can help with the maintenance challenge of modern telecommunication networks and lead onto smart service provider operations. 

Facilitating business outcomes

Smart service provider operations can facilitate a business intent to maintain industry leadership by enabling new innovative revenue generating services. A key challenge for the service provider operations team is upgrading various network functions within the network to realize new services. They require a complete understanding of the interdependency between various network elements that are needed to provide the service. The following example illustrates certain interdependencies:

  • Between the RAN DU and CU
  • From the RAN to the 5G Core access and mobility management function (AMF) and devices
  • Domain constraints that could include the prerequisite of handing over device sessions to an alternative DU serving the same area prior to the upgrade of a specific DU

Once these interdependencies are identified, a time slot for maintenance can be chosen. 

A golden rule trifecta

Smart service provider operations should be evaluated from the perspective of a golden rule trifecta for service provider operations teams: availability, reliability and operability.

A venn diagram with three slightly-overlapping circles. The red circle is labeled "availability", the yellow circle is labeled "reliability", and the green circle is labeled "operability".
  • Availability refers to the percentage of time a service is functional even during failures of individual elements
  • Reliability refers to the probability a service will function as expected without failure for a given duration of time
  • Operability refers to the ability of following operational requirements while keeping a service available in a reliable and functioning condition

To achieve the business intent of enabling new innovative revenue-generating services, even when dependencies and operational constraints are understood, can be cumbersome and can lead to analysis paralysis. Overcoming this paralysis requires combining technical expertise, operational insights and business strategies to enable the rollout and monetization of new services. 

How Red Hat can help

Adopting these new models will give the service provider the ability to operate their network smarter. A smart and autonomous network can take advantage of patterns and insights it can derive from aggregated metrics, logs and events of a region, and use that information with the encoded technical expertise and operational constraints of the service provider. Service providers will then be able to identify an ideal rollout strategy and discover the optimal execution time slots. 

Encoding the service provider’s technical expertise starts with the adoption of automation, spanning from traditional automation to event-driven automation. The service provider’s operational insights come from analytics of metrics, logs and events. The analytics component can be a general purpose or specialized data science stack programmed to discover relevant information of the network and system. The service provider’s operational constraints are encoded as system rules and policies that closed-loop remediation controllers must follow while addressing drifts or converging towards a new desired state.

Red Hat has a number of solutions that will allow service providers to evolve to a smart and autonomous network. Red Hat OpenShift Data Science is a managed cloud service that IT operations teams can enable for data scientists and developers of intelligent applications. It provides a fully supported environment in which to rapidly develop, train and test ML models in the public cloud before deploying in production.

Red Hat Integration is a comprehensive set of integration and messaging technologies to connect applications and data across hybrid infrastructures. It is an agile, distributed, containerized and API-centric solution. It provides service composition and orchestration, application connectivity and data transformation, real-time message streaming, change data capture and API management—all combined with a cloud-native platform and toolchain to support the full spectrum of modern application development.

Red Hat Ansible Automation Platform has the automation required to bridge cloud management, application development, release engineering, network and security operations. Supporting modern features like event-driven automation, it is designed to streamline and operationalize cloud configuration and management across multiple platforms and services. It helps service providers with critical use cases, such as configuration and management of workloads, become productive faster and with a consistent level of effort.

With the right solutions in place, and with a well-established smart operations practice, service providers can gain and maintain a leadership position and outperform the competition, while reducing operational costs. From the service provider operations teams perspective, the adoption of smart operations helps achieve the availability, reliability and operability they can only dream of today.

About the author

William is a seasoned professional with 25 years of experience enabling Telco business transformation through emerging technologies. He works with Telco and MSO partners and customers at the forefront of digital disruption on architecting solutions that transform markets. 

Read full bio