In this blog, we want to outline the behavior of HPA based on memory using a simple Quarkus application.


The Horizontal Pod Autoscaler based on memory automatically scales the number of pods in a replication controller, deployment, replica set, or stateful set based on observed memory utilization. For more information, please visit here.

HPA in Openshift is implemented using three steps:

  1. Grab the resource metric : runtime -> cAdvisor -> kubelet-> Prometheus -> Prometheus adapter -> HPA
  2. Calculate the desired number of replicas based on the memory consumption:
    desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
    For example, if the current metric value is 200MB and the desired value is 100MB, the number of replicas will be doubled, since 200.0 / 100.0 = 2.0
  3. Scale based on desired Replicas count

HPA base on Memory_Draft1



It's so easy to create HPA via the OpenShift console. Just login as Kubeadmin. Go to workload and click Horizontal Pod Autoscaler and create:

HPA base on Memory_Draft1-1

Use Case for HPA Based on Memory

The HPA memory-based scaling is good for when:

  • You have incoming tasks that can be distributed over a number of worker nodes ( for example, HTTP connects work items).
  • Adding more workers reduces the memory-burden on other workers  

Use Case for Stateless Application

The below YAML file describes that if memory percentage is >=60% then spin up pods based on desire Replicas count between 2 to 10 pods:

Let’s say how we set the HAP- based memory use in the OpenShift cluster here:

minReplicas: 2
maxReplicas: 10
targetAverageUtilization: 60

Assumptions for below use case: Memory use percentage increases 30% for every additional 100 requests. The load balancer balances the traffic to each pod as the below example indicates:

100 request = 30% memory utilization , 300 request = 90% memory utilization

Before we get into the use case, I want to make sure we all understand the concept of request and limit in OpenShift.

When you specify the resource request for containers in a pod, the scheduler uses this information to decide which node to place the pod on. When you specify a resource limit for a container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set. The kubelet also reserves at least the request amount of that system resource specifically for that container to use.

The HPA based on memory scale based on request value example targetAverageUtilozation:60 means scale when memory use is 60% of the requested value, as shown in Figure 1:

HPA base on Memory_Draft1 (1)

Figure 1: Request and Limit

When the app(MyAPP) is initially deployed, it receives 200 requests that are equally distributed to replica pods (refer to YAML). Based on our assumptions above, 200 requests will yield memory use of 30% in each pod, as shown in Figure 2:

HPA base on Memory_Draft1 (3)

Figure 2: Stateless Application

desiredReplicas= 2
currentMetricValue = 30
desiredMetricValue = 60

After initial deployment and due to the popularity of the app, application traffic increases to 600 requests. The load balancer will try to balance the traffic equally to all pods, which will result in 300 requests to each pod, and average memory use will increase to 90%. This is more than the desired memory use of 60%, so HPA will scale 4 new pods {desiredReplicas = ceil [currentReplicas * ( currentMetricValue / desiredMetricValue )] , shown in Figure 3:

HPA base on Memory_Draft1 (4)

Figure 3: Scaling Stateless Applications

desiredReplicas= 6
currentMetricValue = 90
desiredMetricValue = 60

Example: How to Scale Quarkus Applications Based on Memory Use

Step 1: Create a new Quarkus project

We’ll use a Maven plug-in to scaffold a new project with the following command:

$ mvn io.quarkus:quarkus-maven-plugin:1.11.1.Final:create \                  
  -DprojectGroupId=org.acme \
  -DprojectArtifactId=quarkus-hpa \
  -DprojectVersion=1.0.0-SNAPSHOT \
  -DclassName="org.acme.GreeterResource" \

This command generates a quarkus-hpa directory that includes a new Quarkus project. When you open the class file in src/main/java/org/acme, you will see a simple RESTful API implementation in the hello() method. Append the following new fill() method to put dummy data(i.e 1MB) into the memory of the Quarkus runtime continuously:

public String fill(@PathParam("index") String index) throws Exception {
  HashMap<String, String> mem = new HashMap<String, String>();
  char[] chars = new char[2 * 1024 * 1024];
  Arrays.fill(chars, 'f');
  mem.put(Math.random() + "", new String(chars));
  System.out.println("Added " + index + "MB");
  return "Added " + index + "MB \n";

You can then use the OpenShift extension and Maven plug-in to deploy the application to your remote OpenShift cluster. Append the following configurations to your Quarkus project's file:

# OpenShift extension configration

# Container Resources Management

Note: We will set request.memory lower than the usual memory resource so HPA will be able to scale the pod (Quarkus app) up in a short time (up to 2 mins).

Step 2: Build and deploy the Quarkus application

To log in to the OpenShift cluster, you have to install the oc command-line interface and use the oc login. Installation options for the CLI will vary depending on your operating system.

Assuming you have oc installed, execute the following command in your Quarkus project home directory:

$ oc new-project hpa-quarkus-demo
$ mvn clean package -DskipTests

This command creates a new project in the remote OpenShift cluster. The Quarks application will be packaged and deployed to OpenShift. The output should end with BUILD SUCCESS.

Using the oc annotate command, add the load-balancing algorithm to the route. This roundrobin annotation will rebalance the network traffic to all running pods automatically:

$ oc annotate route quarkus-hpa

Step 3: Create a horizontal pod autoscaler object for memory use

Now, let’s go to the Developer console in the OpenShift cluster and then navigate the Topology view. You will see that your Quarkus application has been deployed. Click on Actions to add the Horizontal Pod Autoscaler, as shown in Figure 6:

HPA base on Memory_Draft1-2

Figure 6: The Quarkus application in the Topology view.

You should see the configuration page, as shown in Figure 2. Then, set 10 in Maximum Pods and 60 in Memory Utilization. Click on the Save button, as shown in Figure 7:

HPA base on Memory_Draft1-3

Figure 7: Add Horizontal Pod Autoscaler

Next, return to the local environment, then execute the following curl command to invoke the fill() method:

$ for ((i = 1; i <= 100; i++ )); do curl http://YOUR_APP_ROUTE_URL/hello/fill/$i ; sleep 2 ; done

Great! Go back to the Topology view, then you will see that the Quarkus application is starting to scale out(i.e. 3 pods). It usually takes minutes to scale out, as shown in Figure 8:

HPA base on Memory_Draft1-4

Figure 8: Scaling Pods

When you click on View logs, you will see how OpenShift rebalances the network traffic to multiple Quarkus pods automatically, as shown in Figure 9:

HPA base on Memory_Draft1-Mar-03-2021-05-37-47-15-PM

Figure 9: Rebalancing Network Traffic

When you navigate Administrator > Monitoring > Dashboards, you can open the Grafana dashboard to keep tracking the request memory use of the Quarkus pods as well as the number of scaling pods along with Prometheus metrics, as shown in Figure 10.

HPA base on Memory_Draft1

Figure 10: Grafana Dashboard

The increased pods will be decreased to one pod once the average memory use goes down below 60 percent. It usually takes longer than the increased time because HPA makes sure that the pod is really not needed so that shortens periods of inactivity.

Here is the video link to the Quarkus application demo with HPA.


Most applications use a dynamic memory management library for resource allocation. Such applications may not ever release unused memory back to the operating system until the application is finished. For example, consider a C program using the glibc library functions malloc() and free() to acquire and release memory for the application's use. When a malloc() call is made, glibc will acquire a block of memory from the operating system for use by the program. It will then divide up the block and return pieces of memory to callers of malloc(). However, when the application calls free(), the glibc library may not be able to release the memory back to the operating system, because a part of it may still be in use by the application. Over time, memory fragmentation may occur as a result of many malloc() and free() calls. Due to this fragmentation, the application may be using very little memory from the perspective of the dynamic memory management library while actually holding a much higher amount of memory from the operating system.  This "high watermark" usage pattern means that HPA memory-based scaling may be less effective for these types of applications where the metrics show a high amount of usage for an application that once used a lot of memory but has since released it.


HPA based on memory is a great tool to prevent pods from being killed by Out of Memory (OOM ). It is important to note that HPA based on memory is not a silver bullet to solve your application scaling problem since the benefit of HPA memory-based scaling is highly dependent on the nature of your application.