As we work harder to automate cluster administration activities like OpenShift upgrades and OS patching, it becomes more difficult to ensure the availability requirements of applications. In large clusters, the Ops team may not have a detailed understanding of which pods represent an application. They also may not be able to ensure their minimum capacity requirements are maintained. Without that knowledge, you may inadvertently bring down or inhibit multiple applications through a simple rolling server restart during server maintenance.
Introduced as Tech Preview in OpenShift 3.4 and now fully supported in OpenShift 3.6, PodDisruptionBudgets (henceforth PDBs) provide a concise way for the application team to communicate enforceable operating requirements to the cluster. Simply put, a PDB allows the application owner to define a minimum number of pods that should be available for that application to operate in a stable manner. Any action that leverages the eviction API (such as drain) will provide that minimum at any given time.
Let’s take a look at how to create a PDB and what enforcement looks like from inside OpenShift.
The PodDisruptionBudget Object
To illustrate this, we will use an Openshift Router as our example pod. What the below object tells us is that we are creating a PDB called router-pdb
that uses a selector to match pods with the label router: router
and to ensure that there will be at least one pod available.
# cat router-pdb.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: router-pdb
spec:
selector:
matchLabels:
router: router
minAvailable: 1
To create the PDB object, we need 2 pieces of information:
- The selector
- An appropriate minimum
Note: minAvailable can be expressed as an integer or as a percentage of total pods. If an application had 2 replicas, minAvailable: 1 and minAvailable: 50% would achieve the same goal.
Take a look at the router DeploymentConfig for that information:
# oc describe deploymentconfig router
Name: router
Namespace: default
Labels: router=router
Selector: router=router
Replicas: 2
---
---
Here we find our label and that the current number of replicas is two. Setting minAvailable to one gives us a disruption budget of one. That means only one of the two pods can be unavailable at any given time.
PodDisruptionBudgets in Practice
The first step is to create the PDB using the YAML we created above. Make sure you create the PDB in the project where the pods run:
# oc create -f router-pdb.yaml
poddisruptionbudget "router-pdb" created
Looking at the PDB we can see that, as noted above, the allowable-disruption is one and the minimum available is one.
# oc get poddisruptionbudget
NAME MIN-AVAILABLE ALLOWED-DISRUPTIONS AGE
router-pdb 1 1 13m
In more detail, the created PDB object looks like this:
# oc describe poddisruptionbudget router-pdb
Name: router-pdb
Min available: 1
Selector: router=router
Status:
Allowed disruptions: 1
Current: 2
Desired: 1
Total: 2
Next, let’s drain a node and see what happens:
# oc adm drain mrinfra1.example.com --grace-period=10 --timeout=10s
node "mrinfra1.example.com" cordoned
pod "router-2-t0z9g" evicted
node "mrinfra1.example.com" drained
Our infra node was successfully cordoned, the router pod was evicted, and the drain completed successfully.
Viewing the pods, we can see that one router is still running and one is pending.
# oc get pods
NAME READY STATUS RESTARTS AGE
router-2-kjs96 1/1 Running 0 42d
router-2-lbjbk 0/1 Pending 0 <invalid>
The second router is pending because mrinfra1.example.com is still SchedulingDisabled from the drain.
# oc get nodes
NAME STATUS AGE
mrmaster1.example.com Ready,SchedulingDisabled 54d
mrmaster2.example.com Ready,SchedulingDisabled 54d
mrmaster3.example.com Ready,SchedulingDisabled 54d
mrinfra1.example.com Ready,SchedulingDisabled 54d
mrinfra2.example.com Ready 54d
mrnode1.example.com Ready 54d
mrnode2.example.com Ready 54d
mrnode3.example.com Ready 54d
mrnode4.example.com Ready 54d
Inspecting the the PDB, we can see that our allowed disruptions have gone from one to zero, indicating the application or service can no longer tolerate additional pods being down.
# oc get poddisruptionbudget
NAME MIN-AVAILABLE ALLOWED-DISRUPTIONS AGE
router-pdb 1 0 23m
What happens if we try to drain our other infrastructure node? I have added a grace period and timeout to show the failure:
# oc adm drain mrinfra2.example.com --grace-period=10 --timeout=10s
node "mrinfra2.example.com" cordoned
There are pending pods when an error occurred: Drain did not complete within 10s
pod/router-2-kjs96
error: Drain did not complete within 10s
The drain operation failed as there was no room in the PDB. If you look at the logs, you would see the eviction request return a HTTP 429 - Too Many Requests, which in the case of PDBs, means the request failed, but may be retried and succeed at another time.
# journalctl -u atomic-openshift-node.service | grep 'router-8-1zm7k'
---
I0830 11:00:36.593260 12112 panics.go:76] POST /api/v1/namespaces/default/pods/router-8-1zm7k/eviction: (11.416163ms) 429
---
Running the same drain again with no timeout, you would see it waiting indefinitely to try complete:
# oc adm drain mrinfra2.example.com
node "mrinfra2.example.com" cordoned<WAITING>
pod "router-2-kjs96" evicted
node "mrinfra2.example.com" drained
While the above drain is waiting, make mrinfra1.example.com schedulable again:
# oc adm manage-node mrinfra1.example.com --schedulable
NAME STATUS AGE
mrinfra1.example.com Ready 54d
Watching your pods as that happens, you see router-2-kjs96
is still running. After that, router-2-lbjbk
goes from pending to creating to running. Provided that the new pod is Running, the available disruption budget will go back to one and the drain will terminate router-2-kjs96
. If the pod successfully terminates, the drain completes. When mrinfra2.example.com is marked schedulable again the second router replica will redeploy as well.
# oc get pods -o wide -w
NAME READY STATUS RESTARTS AGE NODE
router-2-kjs96 1/1 Running 0 42d mrinfra2.example.com
router-2-lbjbk 0/1 Pending 0 5m <none>
router-2-lbjbk 0/1 ContainerCreating 0 6m mrinfra1.example.com
router-2-lbjbk 0/1 Running 0 6m mrinfra1.example.com
router-2-lbjbk 1/1 Running 0 6m mrinfra1.example.com
router-2-kjs96 1/1 Terminating 0 42d mrinfra2.example.com
router-2-gqhh6 0/1 Pending 0 0s <none>
router-2-kjs96 0/1 Terminating 0 42d mrinfra2.example.com
router-2-gqhh6 0/1 Pending 0 31s <none>
router-2-gqhh6 0/1 ContainerCreating 0 31s mrinfra2.example.com
router-2-gqhh6 0/1 Running 0 37s mrinfra2.example.com
router-2-gqhh6 1/1 Running 0 51s mrinfra2.example.com
As clusters continue to grow, PDBs offer an elegant way to define the needs of the application as a first class citizen. Now is a great time to start the discussion with your development teams!
About the author
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit