Air-gapped environments are those that are physically isolated from other networks, but most importantly isolated from the Internet. No proxies, no jump hosts - nothing. The only way to get data into or out of the environment is via manual data transfer or, if you are extremely lucky, a one-way data diode. Most of the time you’ll be relying on manual data transfers.
An Operator needs to be written with air-gapped environments in mind, but thankfully the key requirements are all fairly straightforward. If you follow the recommendations below you’ll go a long way to creating an Operator that works in air-gapped environments.
That’s great, but why should I care?
Many Kubernetes and OpenShift deployments are in air-gapped environments. These are particularly common in Government, Financial Services, and really any industry where specific data or workloads need to be kept strongly isolated from the public Internet. Without a consideration of these environments you’re artificially limiting the user base that can consume your Operator.
Some Operators will never be suited for an air-gapped environment if they have hard and immovable dependencies on Internet-based resources - but for everyone else, there’s a few small things you can do to maximise the “air-gap friendliness” of your Operator.
How do we make an “air-gap friendly” Operator?
It’s not as complicated as you might think - just think NIRDD:
- Never hardcode a URL.
- Inject trusted certificates.
- Related images.
- Digests, not tags.
Let’s go down the list!
Never hardcode a URL
Hard coding URLs into your Operator when there’s no Internet access is the number one way to prevent your Operator from working in air-gapped environments. Image references to Internet-based registries are probably the primary culprit here.
Don’t hardcode anything. Always allow the users to specify this detail through ConfigMaps, environment variables, Secrets, or - my preference - as additional fields in the spec for your custom resource. As an example, see how the Trident CSI driver allows overriding the source registry, repository and image.
Have defaults defined in the code of your Operator but overrides from external sources. You’re probably 80% of the way to being air-gap friendly if all you do is allow overrides - 90% if you also document them!
Inject trusted certificates
Good practice regardless, but in air-gapped environments this is even more important. You must not rely on external services being signed by a trusted CA within the default trust store in your container. Always give users a means to inject additional trusted certificates. My go-to technique is to let users provide the name of a ConfigMap that you will mount directly into your container in the correct location.
For one such example, have a look at how the Gitea Operator allows users to specify a ConfigMap as an extra field on the CustomResourceDefinition, and injects that ConfigMap into a known location for golang to use.
Operators generally orchestrate other containers in support of an application. They need to generate Kubernetes resources that refer to container images and so those container images need to exist in the air-gapped environment. If there’s no listed source for exactly which containers your Operator requires, the user will never know to mirror them, and your Operator will fail to deploy with ErrImagePull errors.
Your users start playing “whack-a-mole” as they mirror containers one-by-one until your Operator deploys - or they give up and find another Operator to try. There are two solutions to this problem:
Option A is just to document your image dependencies. This is perfectly OK and is the minimum standard to aim for.
Option B is to do Option A and also use the `relatedImages` field on the ClusterServiceVersion.
relatedImages allows you to explicitly list which extra images your Operator can use, and to explicitly mark those as ‘required’ so that installation will fail if they are not available in the environment. Even better, this will allow images your Operator requires to be captured by the mirroring capabilities of the `oc` tool - and that makes mirroring your Operator much simpler.
For detail on using relatedImages, take a look at the proposal document.
Unsurprisingly, I like to combine Option A and Option B, and I recommend you do too!
Digests, not tags
While it is possible to create transparent registry mirrors using ImageContentSourcePolicy custom resources, ICSPs generate mirror configuration that only allows mirrors to take effect when the image is pulled by digest. The mirror will never be used if you configure that mirror via an ICSP and then try to pull with a tag reference.
You can create your own custom registries.conf configuration using MachineConfig objects and explicitly allow mirror overrides by tag, however this is a step your users will need to complete unless you provide tooling for them.
Where possible, use digests to refer to the images your Operator requires. There’s a couple of advantages here:
Firstly, you know exactly what images a particular version of your Operator is running. This simplifies your troubleshooting and eliminates tagging mistakes as a source of problems - for example, the operator deploys the “v2.1” tag but a tired sysadmin has mistakenly tagged “v2.1” against an image that is actually version 2.0 of the code.
Secondly, you can use existing ImageContentSourcePolicy resources without customisation.
If you’d prefer to use tags there’s a little more work involved - either you need to allow users to override your registry, repository and image (as described above, and my personal preference), or you need to provide documentation for users to create a custom ImageContentSourcePolicy if deploying on OpenShift.
Last, but not least, don’t forget some documentation. Describe how to:
- Bundle up your Operator and all associated artifacts for transfer across to the air-gapped environment.
- Identify and pull all required images that your Operator attempts to deploy.
- Install on the air-gapped side.
- Configure the container image, repository and registry (if needed).
- Import/export artifacts and extra content that your Operator may rely on (e.g. OVAL data).
Don’t forget your friendly air-gapped users
Air-gapped environments are common in high security or high sensitivity environments, but don’t let an air-gap prevent your Operator from running.
Just remember to think NIRDD - Never hardcode URLs, Inject certificates, list Related images, and use Digests, not tags, and don’t forget your Documentation.
Looking for a list of Red Hat Operators that work in Disconnected/Air-Gapped environments? Have a look at this knowledge base article.