In Part 1 of this series on the Open Policy Agent (OPA), we gave a brief rundown of why you might want to use the OPA Gatekeeper controller for policy enforcement in your Kubernetes clusters. We also gave a few examples of OPA’s query language, Rego, and of the Kubernetes Custom Resource Definitions (CRDs) that OPA Gatekeeper uses and creates.
This follow-up post dives into practical aspects of writing and implementing OPA policies for Kubernetes clusters, demonstrating a working example that can be used to restrict a pod’s allowed tolerations of node taints.
The Problem: Node Taints and Tolerations
Taints and tolerations provide one mechanism to allow fine-grained control of the placement of pods on a specific set of nodes. Typically, the default Kubernetes scheduler just considers CPU and memory resource availability when placing pods. Pods can use node affinity to request placement on certain nodes. While node affinity rules implement scheduling preferences on the pod side, node taints manage pod placement preferences from the node’s point of view. When you instruct Kubernetes to drain a node of its pods, Kubernetes gives that node the taint node.kubernetes.io/unschedulable:NoSchedule
to signal to the scheduler not to add new pods. You may also want to taint nodes that have expensive Graphics Processing Units (GPUs) to limit their availability to pods that need them or to taint certain nodes to isolate their sensitive workloads.
To permit the Kubernetes scheduler to place a given pod on a node with a given taint, you can add a taint toleration to that pod’s specification. A taint toleration will not force the scheduler to drop that pod onto a corresponding node, but it does give the scheduler that option. When used in combination with node affinity rules, taint tolerations can force scheduling onto specific nodes.
While these native Kubernetes controls provide a great deal of control over workload scheduling, one piece is missing. What if you do not want certain pods to be able to tolerate certain node taints? Kubernetes offers no built-in way to declare that only certain pods can have certain taint tolerations. OPA Gatekeeper constraint policies can fill that enforcement gap.
Restricting Taint Tolerations with OPA Gatekeeper
Setup
Follow these steps if you would like to work through the examples in this post.
- Check out the GitHub repository stackrox/blog-examples and navigate to the subdirectory
code/opa-gatekeeper-taint-tolerations
. Due to the length of the example files, we will not quote them in full here. You can still follow along in the online repository without cloning it locally, if you prefer. - Create or use a Kubernetes test cluster of version 1.14 or later.
- Deploy OPA Gatekeeper. (This post’s files were tested against Gatekeeper release v3.1.0-beta.8.)
- Install the
opa
command-line tool on the computer where you checked out theblog-examples
repository. You may want to align theopa
version to the OPA version compiled into your installed Gatekeeper release. For example, Gatekeeper v3.1.0-beta.8 uses OPA v0.17.2, which is not the latest release.
Repository structure
.
└── opa-gatekeeper-taint-tolerations
├── README.md
├── constraint.yaml
├── constraint_template.yaml
├── hello-world.yaml
├── src.rego
└── src_test.rego
The core files:
src.rego
contains the OPA Rego code for our taint toleration policy.src_test.rego
contains the corresponding test cases for our policy Rego.constraint_template.yaml
- The Gatekeeper Custom Resource (CR)ConstraintTemplate
for our policy. Note that this file also contains the code fromsrc.rego
inline, but theopa
tool cannot parse the manifest YAML, so we need to copy the Rego code out to a separate file for testing. If you use this layout for your own policies, you will need to remember to synchronize code changes between the two files.
Additional files:
constraint.yaml
- The manifest for a test of ourRestrictedTaintToleration
CR.hello-world.yaml
- A minimalist deployment to demonstrate the constraint in practice.
ConstraintTemplate
constraint_template.yaml
, on first glance, looks like a lot of Kubernetes resource manifests. A few special fields to note:
spec.crd.spec
defines the Custom Resource Definition (CRD) of our new constraint type. The names
field sets the resource type’s name; only kind
is required. The validation
field is optional, but with it you can define the CR’s field names and types using an OpenAPI v3 schema, allowing some manifest correctness verification and enforcement by the Kubernetes API.
The spec.targets
field contains the policy Rego code.
When we apply this manifest in a Kubernetes cluster with Gatekeeper installed, the Gatekeeper custom controller will create a new Kubernetes CR of type RestrictedTaintToleration
. We can then create RestrictedTaintToleration
cluster objects to define which objects the Gatekeeper admission controller will reject when queried by the Kubernetes API.
Writing a Policy
We outlined a few basics on writing Rego and ConstraintTemplates
in Part I. Keep these additional points in mind, especially when writing policies that involve more complexity than just checking an object to see if a given field does or does not exist.
Policy Inputs
Gatekeeper passes data to each policy evaluation in the input
object in JSON notation, with the Constraint
fields in input.parameters
. input.review.object
contains the Kubernetes API spec of the object sent for evaluation.
Violations
- The
violation
method forms the crux of a Gatekeeper policy. Your policy can contain zero orviolation
definitions, although with no defined violations, all objects will pass. - A
violation
method block triggers a policy violation only if all of the block’s statements evaluate totrue
. - One or more triggered violations in a policy evaluation signals Gatekeeper to deny an admission request. Gatekeeper does not stop evaluating the policy after the first violation, allowing interested users to see all the potential reasons for an object’s failure.
- Note that this usage of
violation
in Gatekeeper differs from standalone OPA’s allow/deny semantics. Rego policies generally cannot be used in both Gatekeeper and non-Gatekeeper OPA without some modifications.
Our policy has two violation
blocks. One tests for exact matches to our restricted taint, while the other applies when a pod has a global toleration that is not allowed per our constraint’s configuration.
Comprehensions and Undefined Fields
We use Rego comprehensions to find tolerations that match our restricted taint and to check if the pod has a global toleration that matches all taints.
global_tolerations := [key | k := object.get(tolerations[_], "key", "")
k == ""
key := k]
Global tolerations have only an operator
field defined. They are the only toleration type that does not have a key
, so we check the pod’s tolerations for those without a valid key
field. Note that we do not test the key field directly using the path-based tolerations[_].key
notation. Instead, we use the method object.get
, which takes three arguments: (1) the object whose element we want to read, (2) the field name that we want to read, and (3) a default value if object.key
is undefined. If we tried to read tolerations[_].key
from a tolerations
array that contained an element without a key
field, the comprehension would return undefined
and our policy evaluation would fail when we tried to access global_tolerations
later.
We use object.get
several times in this policy, to ensure that we do not try to access optional fields that may not be set in our input spec. Failing to handle values that could be undefined can result in unintended policy execution outcomes.
Functions
Our policy defines several functions to match a pod’s tolerations against our restricted taint. Note that we define some functions more than once.
# If effect is empty, match any
effect_check("") {
true
}
# Otherwise, specific effect must match
effect_check(effect) {
effect == taint.effect
}
The first definition gets called if effect_check
receives an empty string as its argument. The second definition gets called if the passed argument is not empty, in which case it assigns the value to effect
.
Rego does not have a standard if-then-else syntax in most contexts. Using multiple definitions of a function allows for conditional execution.
Testing a Policy
If your enforcement use case was important enough to necessitate writing a policy, it should also warrant tests for that policy.
When writing test coverage for your Gatekeeper policy, you want to consider the following points carefully.
- What Kubernetes API resource fields does my policy query? Are any of them optional? Can they appear more than once in a spec?
- How many positive test cases do I need to write to make sure my policy will do what I expect?
- How many negative test cases do I need to write to make sure my policy will not produce results that I do not want?
Writing Tests
Policy tests are also written in Rego. By convention, they live in the same directory as the source file. In our case, they can be found in the file src_test.rego
to correspond with src.rego
. Note the matching package
name at the top of each file.
Test method names should always begin with the prefix test_
. Let’s take a look at the first test in the file.
test_input_no_global_violation {
input := { "review": input_review_global,
"parameters": input_parameters_no_global }
results := violation with input as input
count(results) > 0
}
First, we define the input
object variable with a review
field, whose contents come from the input_review_global
declaration later in the file, and a parameters
field, set to the value of the input_parameters_no_global
object, also defined later in the file.
As we said earlier, input.review
contains the specification of the object request that was sent to Gatekeeper for evaluation, while input.parameters
holds the constraint’s configuration.
Our mock object spec is:
input_review_global = {
"object": {
"spec": {
"tolerations": [
{
"operator": "Exists"
}
]
}
}
}
Our mock object does not need to comprise a complete pod manifest. We need to define only the fields that our policy requires for evaluation.
For our constraint parameters:
input_parameters_no_global = {
"restrictedTaint": {
"key": "taintname",
"value": "taintvalue",
"effect": "NoSchedule"
},
"allowGlobalToleration": false
}
Our mock pod spec from input_review_global
has only one taint toleration defined, a global taint toleration. However, our constraint parameters set allowGlobalToleration
to false
, meaning we do not want to allow pods with global taint tolerations to use this restricted taint. Therefore, in the last line of our test method, we expect the number of violation results to be greater than zero. Gatekeeper should deny a matching request.
Running Tests
We can use the opa
command-line tool to evaluate our tests.
$ opa test --explain fails src.rego src_test.rego
data.restrictedtainttoleration.test_input_no_global_violation: PASS (7.099785ms)
data.restrictedtainttoleration.test_input_ok_global_allow: PASS (474.856µs)
data.restrictedtainttoleration.test_input_no_global_equal_match_violation: PASS (563.708µs)
data.restrictedtainttoleration.test_input_ok_global_equal_match_allow: PASS (455.672µs)
data.restrictedtainttoleration.test_input_equal_match_violation: PASS (870.17µs)
data.restrictedtainttoleration.test_input_equal_no_effect_match_violation: PASS (660.812µs)
data.restrictedtainttoleration.test_input_equal_no_operator_match_violation: PASS (1.14774ms)
data.restrictedtainttoleration.test_input_equal_no_effect_no_operator_match_violation: PASS (928.342µs)
data.restrictedtainttoleration.test_input_equal_different_value_match_allow: PASS (424.302µs)
data.restrictedtainttoleration.test_input_no_toleration_field_allow: PASS (330.352µs)
--------------------------------------------------------------------------------
PASS: 10/10
Testing in Kubernetes
Now we are ready to test our Gatekeeper constraint in an actual Kubernetes cluster.
-
Apply the
ConstraintTemplate
:kubectl apply -f constraint_template.yaml
-
Apply the sample constraint:
kubectl apply -f constraint.yaml
(Note that this file needs to be applied separately from and after theConstraintTemplate
object, because the Kubernetes API will reject this manifest if its resource type does not yet exist.) -
Apply a deployment with the restricted taint and watch its pod fail to run:
kubectl apply -f hello-world.yaml
$ kubectl get events --sort-by=.metadata.creationTimestamp
LAST SEEN TYPE REASON OBJECT MESSAGE
[...]
2m Normal ScalingReplicaSet deployment/hello-world Scaled up replica set hello-world-675bf47d7f to 1
37s Warning FailedCreate replicaset/hello-world-675bf47d7f Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [denied by privileged] Toleration is not allowed for taint {"effect": "NoSchedule", "key": "privileged", "value": "true"}
119s Warning FailedCreate replicaset/hello-world-675bf47d7f Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [denied by privileged] Toleration is not allowed for taint {"value": "true", "effect": "NoSchedule", "key": "privileged"}
[...]
Gatekeeper evaluates our policy against all pod requests in any namespace except kube-system
. That scoping comes from our RestrictedTaintToleration
spec.match
field, which exists in all constraint resources that Gatekeeper creates.
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
excludedNamespaces:
- kube-system
The kinds
field matches specific Kubernetes API resource groups and types. We use another matcher, excludedNamespaces
, to scope evaluation to any resource that is not in the kube-system
namespace. You can read about more constraint matching options in the Gatekeeper docs.
Wrapping Up
You can find more examples of Gatekeeper policies in the GitHub repo for reference or use in your clusters.
We hope you have a better idea of what OPA Gatekeeper policies can do and the requirements for writing reliable policies. Gatekeeper opens up a lot of possibilities for enforcing security best practices and general consistency of cluster resource configurations. OPA and Gatekeeper provide the ability to manage object configuration best practices and conventions to a very fine degree. However, creating effective and safe policies requires defining and writing comprehensive test cases. A faulty policy could allow unwanted objects into the cluster or keep acceptable objects out. Policy writing also requires a strong knowledge of the Kubernetes API specifications in question, although writing policies can serve as motivation and an exercise to learn more about resource specs.
About the author
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit