Skip to main content

How to use cloud hyperscalers to handle 5G traffic demand bursts

A 5G architecture that uses hyperscaler resources, rather than expanding your infrastructure, helps you handle temporary traffic bursts without permanent capital expense.
Image
Burst of light

Photo by CHUTTERSNAP on Unsplash

Telecom, media, and entertainment (TME) industries use the term burst to describe unexpected, unplanned, or peak interest in consumable services and products that existing capabilities and capacities aren't able to handle.

[ Learn how to build a flexible foundation for your organization. Download An architect's guide to multicloud infrastructure. ]

One way TME service providers try to address these demand bursts is by signing partnership agreements with hyperscalers. This enables them to avoid unnecessary capital investments and handle temporary consumption increases in their services portfolio.

Image
Bursting flow
Figure 1: Bursting flow (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

TME service providers primarily use on-premises solutions as their application platform for various reasons, including:

TME providers can use hyperscalers to address bursts while not compromising the reasons above.

Things to consider in a 5G burst architecture

The key characteristics of burst are:

  • Can occur at unpredictable dates or times
  • Have a temporary or ephemeral duration
  • Are usually tied to low ROI against capital expenditures (capex) + operational expenditures (opex)

Applications that are amenable to burst with hyperscaler resources are:

  • Truly cloud-native with ease of horizontal scalability
  • Easily integrated with consumer traffic flow

Therefore, our solution needs to provide on-demand horizontal scaling, ephemeral resources, the fastest time to market, and the lowest TCO, including for cloud spending and talent.

TL;DR: Bursting shall be implemented with the highest level of automation and lowest level infrastructure cost or investment possible.

[ Use distributed, modular, and portable components to gain technical and business advantages. Download Event-driven architecture for a hybrid cloud blueprint. ]

Options for bursting 5G

The two major ways of bursting 5G are a 5G application stack that implements 3rd Generation Partnership Project (3GPP) 5G standards or an application platform that accommodates 5G, both of which are subject to bursting. There are two options for bursting the application platform: using a hyperscaler to expand the size of the existing platform or adding ephemeral new clusters on a hyperscaler.

Expanding the platform towards hyperscaler infrastructure

Although technically possible, option A is not the recommended approach because it would increase the size of the failure domain and grow a common attack surface. The main cluster is already under heavy traffic, and adding new worker capacity will not relieve the cluster control plane (actually, it will overload it). Also, mixing the different infrastructure types under a cluster formation will create non-homogenous configuration models (also called "snowflakes") for platform lifecycle management.

Note that cluster autoscaling is possible and can be recommended while preserving the serving infrastructure layer consistency. However, that would not address nor cover bursting from on-premises to cloud or hyperscaler use cases.

Adding ephemeral clusters on a hyperscaler to the application platform farm

Additional ephemeral clusters come with the cost of an additional cluster control plane. However, that helps lower the failure domain and segregate attack surfaces with control plane isolations.

We picked the second option to build our recommended solution architecture.

Our solution architecture

Image
Platform bursting solution topology
Figure 2: Platform bursting solution topology (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

The solution's main components include:

1. Burst the platform

Platform management is the heart of the platform burst operation that acts as a 5G platform (or cluster) dispenser on demand.

RH-ACM cluster pool functionality provides rapid and cost-effective access to configured RH-OCP clusters on demand and at scale. Cluster pools offer a configurable and scalable number of OCP clusters on Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure that can be claimed when needed.

Image
Create a cluster pool on AWS, GCP, Azure
Figure 3: Ready to consume clusters with dispenser pools on AWS, GCP, and Azure (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

Cluster pools are especially powerful for providing or replacing cluster environments for development, continuous integration, production scenarios, and addressing on-demand capacity increases (bursting).

Image
Cluster pool sizing
Figure 4 (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

You can specify the number of clusters to keep running so that they are available to be claimed immediately for bursting to a cloud. The remaining clusters will be held in a hibernating state so that they can be resumed and claimed quickly (compared to cluster creation).

Image
Available versus hibernated clusters
Figure 5: Available vs. hibernated clusters ready to be claimed or used (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

When a cluster claim is requested, the pool assigns a running cluster to it. If no running clusters are available, a hibernating cluster resumes to provide the cluster or a new cluster is provisioned.

Image
Available versus hibernated clusters on AWS EC2
Figure 6: Available vs. hibernated cluster nodes on AWS EC2 console lookout (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

[ Learn more about cloud-native development in the eBook Kubernetes Patterns: Reusable elements for designing cloud-native applications. ]

Image
Claim a cluster
Figure 7 (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

The cluster pool automatically creates new clusters and resumes hibernating clusters to maintain the specified size and number of available running clusters in the pool.

Image
Successful cluster claim
Figure 8 (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

A cluster is claimed when a cluster is running and ready in the cluster pool. The cluster pool automatically creates new running and hibernated clusters in the cluster pool to maintain the requirements that are specified for the cluster pool.

Image
Claimed managed cluster
Figure 9: Claimed cluster as a managed cluster (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

When bursting ends (when traffic levels return to normal) and extra capacity is no longer needed, the system initiates destruction of the cluster pool. In cluster pool destruction, all unclaimed hibernating clusters are destroyed, and their resources are released. See the "burst management" section below for more information.

2. Burst the 5G stack

From the RH-ACM perspective, 5G Core is an application (with multiple 5G microservices inside), and the application model is based on subscribing to one or more Kubernetes resource repositories (channel resources) that contain resources deployed on managed clusters. Both single and multicluster (burst-case) applications use the same Kubernetes specifications, but multicluster applications involve more deployment and application management lifecycle automation.

Image
Application subscription model
Figure 10: Application lifecycle management model with RH-ACM (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

Placement rules define the target clusters where resource templates can be deployed. You can use placement rules to facilitate multicluster deployment (bursting) of 5G Core deployments. Placement rules are also used for governance and risk policies. See multicloud-operators-placementrule and the documentation on placement rules for details on multicloud placement rules.

Image
5G application stack deployment on RH-ACM
Figure 11: 5G Application stack deployment lookout via RH-ACM (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

3. Burst the traffic

You need to do post-placement work for consumer traffic management when additional platforms are ready to be used on a hyperscaler with the 5G stack deployment. Additional 5G capacity is plugged into the incoming traffic path. This should account for ingress controllers, fully qualified domain names (FQDNs), and microservices reachability in the other cluster. This can be done in various ways (individually or in combination), and our group's next article will elaborate on this topic.

  • Leveraging service mesh (option X): Use Istio Ingress with Istio virtual services to implement load balancing across numerous deployments of 5G across multiple clusters with federated mesh. See the "service mesh federation" section of Edge computing: How to architect distributed scalable 5G with observability for details.
  • Leveraging external DNS for Kubernetes (option Y): Adding a newly created 5G deployment on a hyperscaler to an existing DNS record resolution path allows seamless service scaling. Visit the ExternalDNS Kubernetes GitHub repository for details.

The latter approach can be coupled with geoproximity information to serve 5G consumers with the nearest deployment location. Therefore, this is our favorite option so far. Please visit AWS's geolocation routing page for details.

Burst management

Image
Burst season online shopping
Figure 12: Google burst season (online shopping) stats (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

For the best price/performance operational model, you must be conscious of resource usage over time. So, when to destroy a cluster vs. a cluster pool? Here are some approaches:

  • When should you destroy a claimed cluster? When the burst instance completes (for example, a major holiday, such as Thanksgiving in the US) and traffic returns to normal for the given time, yet the burst season is not over yet. The cluster pool is still up and ready to provide additional cluster(s) when needed.
Image
Destroying a cluster at the end of the burst period
Figure 13 (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)
  • When should you destroy a cluster pool? When the burst season completes (for example, the US holiday season, which spans the last two months of the year).
Image
Destroying a cluster pool
Figure 14 (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

Summary

We provided a lot of information here. Here is a simple flow diagram to summarize what we covered:

Image
Flow of decision making for bursting
Figure 15 (Fatih Nar and Brandon Jozsa, CC BY-SA 4.0)

Remember there are multiple paths to fixing a problem or addressing a need. In sharing our solution, we included various choices with pros and cons. Our solution components will not fit every technical context or business reality. Therefore, it's important to remain open-minded and be able to adopt a better solution based on your needs.


This originally appeared on Medium as Burst OR not to burst! and is republished with permission.

What to read next

Author’s photo

Fatih Nar

Fatih (aka The Cloudified Turk) has been involved over several years in Linux, Openstack, and Kubernetes communities, influencing development and ecosystem cultivation, including for workloads specific to telecom, media, and More about me

Author’s photo

Brandon Jozsa

As a principal solutions architect, Brandon brings over 25 years of telco industry experience to the NA TME Tiger team. More about me

Related Content

OUR BEST CONTENT, DELIVERED TO YOUR INBOX