Subscribe to the feed

Telecom, media, and entertainment (TME) industries use the term burst to describe unexpected, unplanned, or peak interest in consumable services and products that existing capabilities and capacities aren't able to handle.

[ Learn how to build a flexible foundation for your organization. Download An architect's guide to multicloud infrastructure. ]

One way TME service providers try to address these demand bursts is by signing partnership agreements with hyperscalers. This enables them to avoid unnecessary capital investments and handle temporary consumption increases in their services portfolio.

Bursting flow

TME service providers primarily use on-premises solutions as their application platform for various reasons, including:

  • Regulatory compliance (for example, data locality requirements)
  • Better total cost of ownership (TCO) compared to the return on investment (ROI) ratio for stable or saturated service consumptions with optimized infrastructure and platform characteristics
  • End-to-end ownership and administration from infrastructure to platform and application stacks

TME providers can use hyperscalers to address bursts while not compromising the reasons above.

Things to consider in a 5G burst architecture

The key characteristics of burst are:

  • Can occur at unpredictable dates or times
  • Have a temporary or ephemeral duration
  • Are usually tied to low ROI against capital expenditures (capex) + operational expenditures (opex)

Applications that are amenable to burst with hyperscaler resources are:

  • Truly cloud-native with ease of horizontal scalability
  • Easily integrated with consumer traffic flow

Therefore, our solution needs to provide on-demand horizontal scaling, ephemeral resources, the fastest time to market, and the lowest TCO, including for cloud spending and talent.

TL;DR: Bursting shall be implemented with the highest level of automation and lowest level infrastructure cost or investment possible.

[ Use distributed, modular, and portable components to gain technical and business advantages. Download Event-driven architecture for a hybrid cloud blueprint. ]

Options for bursting 5G

The two major ways of bursting 5G are a 5G application stack that implements 3rd Generation Partnership Project (3GPP) 5G standards or an application platform that accommodates 5G, both of which are subject to bursting. There are two options for bursting the application platform: using a hyperscaler to expand the size of the existing platform or adding ephemeral new clusters on a hyperscaler.

Expanding the platform towards hyperscaler infrastructure

Although technically possible, option A is not the recommended approach because it would increase the size of the failure domain and grow a common attack surface. The main cluster is already under heavy traffic, and adding new worker capacity will not relieve the cluster control plane (actually, it will overload it). Also, mixing the different infrastructure types under a cluster formation will create non-homogenous configuration models (also called "snowflakes") for platform lifecycle management.

Note that cluster autoscaling is possible and can be recommended while preserving the serving infrastructure layer consistency. However, that would not address nor cover bursting from on-premises to cloud or hyperscaler use cases.

Adding ephemeral clusters on a hyperscaler to the application platform farm

Additional ephemeral clusters come with the cost of an additional cluster control plane. However, that helps lower the failure domain and segregate attack surfaces with control plane isolations.

We picked the second option to build our recommended solution architecture.

Our solution architecture

Platform bursting solution topology

The solution's main components include:

1. Burst the platform

Platform management is the heart of the platform burst operation that acts as a 5G platform (or cluster) dispenser on demand.

RH-ACM cluster pool functionality provides rapid and cost-effective access to configured RH-OCP clusters on demand and at scale. Cluster pools offer a configurable and scalable number of OCP clusters on Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure that can be claimed when needed.

Create a cluster pool on AWS, GCP, Azure

Cluster pools are especially powerful for providing or replacing cluster environments for development, continuous integration, production scenarios, and addressing on-demand capacity increases (bursting).

Cluster pool sizing

You can specify the number of clusters to keep running so that they are available to be claimed immediately for bursting to a cloud. The remaining clusters will be held in a hibernating state so that they can be resumed and claimed quickly (compared to cluster creation).

Available versus hibernated clusters

When a cluster claim is requested, the pool assigns a running cluster to it. If no running clusters are available, a hibernating cluster resumes to provide the cluster or a new cluster is provisioned.

Available versus hibernated clusters on AWS EC2

[ Learn more about cloud-native development in the eBook Kubernetes Patterns: Reusable elements for designing cloud-native applications. ]

Claim a cluster

The cluster pool automatically creates new clusters and resumes hibernating clusters to maintain the specified size and number of available running clusters in the pool.

Successful cluster claim

A cluster is claimed when a cluster is running and ready in the cluster pool. The cluster pool automatically creates new running and hibernated clusters in the cluster pool to maintain the requirements that are specified for the cluster pool.

Claimed managed cluster

When bursting ends (when traffic levels return to normal) and extra capacity is no longer needed, the system initiates destruction of the cluster pool. In cluster pool destruction, all unclaimed hibernating clusters are destroyed, and their resources are released. See the "burst management" section below for more information.

2. Burst the 5G stack

From the RH-ACM perspective, 5G Core is an application (with multiple 5G microservices inside), and the application model is based on subscribing to one or more Kubernetes resource repositories (channel resources) that contain resources deployed on managed clusters. Both single and multicluster (burst-case) applications use the same Kubernetes specifications, but multicluster applications involve more deployment and application management lifecycle automation.

Application subscription model

Placement rules define the target clusters where resource templates can be deployed. You can use placement rules to facilitate multicluster deployment (bursting) of 5G Core deployments. Placement rules are also used for governance and risk policies. See multicloud-operators-placementrule and the documentation on placement rules for details on multicloud placement rules.

5G application stack deployment on RH-ACM

3. Burst the traffic

You need to do post-placement work for consumer traffic management when additional platforms are ready to be used on a hyperscaler with the 5G stack deployment. Additional 5G capacity is plugged into the incoming traffic path. This should account for ingress controllers, fully qualified domain names (FQDNs), and microservices reachability in the other cluster. This can be done in various ways (individually or in combination), and our group's next article will elaborate on this topic.

  • Leveraging service mesh (option X): Use Istio Ingress with Istio virtual services to implement load balancing across numerous deployments of 5G across multiple clusters with federated mesh. See the "service mesh federation" section of Edge computing: How to architect distributed scalable 5G with observability for details.
  • Leveraging external DNS for Kubernetes (option Y): Adding a newly created 5G deployment on a hyperscaler to an existing DNS record resolution path allows seamless service scaling. Visit the ExternalDNS Kubernetes GitHub repository for details.

The latter approach can be coupled with geoproximity information to serve 5G consumers with the nearest deployment location. Therefore, this is our favorite option so far. Please visit AWS's geolocation routing page for details.

Burst management

Burst season online shopping

For the best price/performance operational model, you must be conscious of resource usage over time. So, when to destroy a cluster vs. a cluster pool? Here are some approaches:

  • When should you destroy a claimed cluster? When the burst instance completes (for example, a major holiday, such as Thanksgiving in the US) and traffic returns to normal for the given time, yet the burst season is not over yet. The cluster pool is still up and ready to provide additional cluster(s) when needed.
Destroying a cluster at the end of the burst period
  • When should you destroy a cluster pool? When the burst season completes (for example, the US holiday season, which spans the last two months of the year).
Destroying a cluster pool

Summary

We provided a lot of information here. Here is a simple flow diagram to summarize what we covered:

Flow of decision making for bursting

Remember there are multiple paths to fixing a problem or addressing a need. In sharing our solution, we included various choices with pros and cons. Our solution components will not fit every technical context or business reality. Therefore, it's important to remain open-minded and be able to adopt a better solution based on your needs.


This originally appeared on Medium as Burst OR not to burst! and is republished with permission.


About the authors

As a principal solutions architect, Brandon brings over 25 years of telco industry experience to the NA TME Tiger team. For several years, Brandon has been contributing to the development of OpenStack and Kubernetes, and as the original architect and project technical lead (PTL) of OpenStack-Helm, he has been specifically targeting cloud-native solutions for performant, telecom-based workloads. Prior to joining Red Hat, Brandon was a chief architect at Mavenir and also previously served as a lead architect at Charter and AT&T.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Original series icon

Original shows

Entertaining stories from the makers and leaders in enterprise tech