Introduction to Container Security
When it comes to container security, this is definitely food for thought for sys/infra admins or security engineers, and the basic approach for security is defense-in-depth, but it requires tremendous efforts in terms of both platform and application. One of the reasons is Kubernetes platforms can run on different operating systems. The official doc says:
Kubernetes is a project that can run using different operating systems and add-on components that offer no guarantees of supportability from the project. As a result, the security of different Kubernetes platforms can vary.
Besides, security considerations are varied by security vendors or security solutions in their own way, but one of the effective ways is system hardening because it can mitigate security risks by eliminating potential attack vectors and decreasing the system's attack surface. OpenShift is secure by default as described in the OpenShift Security Guide Book and has several built-in security features from Host OS to container orchestration. For further security aspects, the Compliance Operator was released in OCP 4.6, so I would like to show you what is needed to secure or harden a Kubernetes cluster briefly, and then discuss the differences between Red Hat Advanced Cluster Management, the Compliance Operator, and Open Policy Agent (OPA) to help you understand how the Compliance Operator helps secure and harden your OpenShift cluster.
What Should be Hardened in Your Cluster?
To secure your OpenShift cluster, it is mandatory to consider both platforms (Kubernetes) and host OS (RHCOS, RHEL) perspectives, because Kubernetes is composed of control plane machines (master) and worker machines (node), then the Kubernetes services such as API Server, etcd, or controller manager run on the control plane to manage the workload on the worker machines. Meanwhile the CRI-O container engine manages the containers, and Kubelet receives requests for managing containers from the API server. Both the CRI-O container engine and Kubelet run on the worker machines to initiate containers creation and running. Understanding the OpenShift Container Platform control plane describes more about the control plane and its components.
The basic approach to hardening the host OS is almost the same as the RHEL 8 Security Hardening. This applies even if the host OS uses RHCOS or RHEL, for instance, validating if the file has appropriately restrictive file permissions, if the file ownership is appropriately set, if required systemd services or processes are launched with appropriate arguments or parameters in the configuration, or if the appropriate kernel parameters are set. Then, hardening the control plane is specific to the Kubernetes services and includes the control plane components or the master configuration files. It should validate if the API server has started with restrictive arguments regarding allowing specific admission plug-ins, enabling audit logging, applying etcd server and peer configurations, and restricting RBAC, among others. But for clusters that use RHCOS for all machines, updating or upgrading are designed to become automatic events from the central control plane, because OpenShift completely controls the systems and services that run on each machine, including the operating system itself through the Machine Config Operator. You can learn more about the Machine Config Operator at OpenShift Container Platform 4: How Does Machine Config Pool Work? or Machine Config and Machine Updates. So, several questions may come to your mind, including: Are there any benchmarks for hardening; how to check the validation; how to fix the issues or apply for the recommended configurations; how to automate a series of the process; and who should prepare for the policy files?
Why Compliance Operator?
Why is the Compliance Operator needed to validate the hardening and apply changes in the configuration of the operating system and the platform? The Compliance Operator is defined as follows:
The compliance operator is an OpenShift Operator that allows an administrator to run compliance scans and provide remediations for the issues found. The operator leverages OpenSCAP under the hood to perform the scans.
In other words, the operator checks the host and the platform to detect gaps in compliance by specifying profiles for scan and creates summary reports about security compliance so that you will be able to find if there is any configuration that violates the policy in the cluster. The reports also show which remediations are applied, so you can choose if you want to apply the recommended configuration by hand, step-by-step, or automatically. In short, the whole process goes like this: choosing a profile for scanning, specifying the scan settings, then initiating the scan, and generating the reports.
Then, a few more questions come up: Does OpenShift provide security guidelines or benchmarks for hardening like Kubernetes CIS benchmark? What is included in profiles, how are profiles created, who creates profiles, and how is the cluster scanned? The operator leverages OpenSCAP, a NIST-certified tool, to scan and enforce security policies, and the security policies for the compliance checks are derived through SCAP content and built from the community-based ComplianceAsCode/content project. A bundle of security policies, or profiles created by default when the operator is installed and profiles scheduled include NIST 800-53 Moderate (FedRAMP), Australian Cyber Security Centre (ACSC) Essential Eight, CIS OpenShift Benchmark, and others, so far. You can also create or tailor your own profiles so that you can pick the rules you want to run other than the profiles provided by default.
Apart from the policies, you might be curious how the scan process goes through. First of all, the scan process belog to the compute machine and the cluster-level. For instance, the node scan directly runs OpenSCAP contents on the worker machines to verify the operating system configuration by running a privileged pod with reading access to the host per worker machines to scan. The cluster-wide scan, meanwhile, aims at the OpenShift cluster itself to verify the OpenShift and Kubernetes configuration by running a non-privileged pod and fetching the Kubernetes API objects but the pod has no access to the host.
What Is the Difference Between RHACM and OPA?
Red Hat Advanced Cluster Management for Kubernetes (RHACM) has three main features: multi-cluster management, application life cycle management, and policy-based governance, risk, and compliance. So, what exactly does the compliance feature aim at, or how does it differ from the Compliance Operator? The term compliance here is to propagate policies using the Policy Controller from the hub cluster that runs RHACM to managed clusters so that the same API objects are placed across the managed clusters.
Then, policies refer to the OpenShift and Kubernetes API objects or API resources, such as NetworkPolicy, Role, or SecurityContextConstraints, and a collection of policies are created by the community-based Policy Collection project, which contains stable and community versions. Stable policies are supported by RHACM while community policies are maintained by the open source community. The difference between the Compliance Operator and RHACM is that RHACM does NOT provide scanning nor security policies based on the OpenSCAP contents, but it guarantees the required configurations are loaded across the clusters and Policy Collection includes configurations to secure your cluster. For instance, Example to configure an image policy is to define the repositories from where OpenShift can pull images; Trusted Container policy is to detect if running pods are using trusted images; and Trusted Node policy is to detect if there are untrusted or unattested nodes in the cluster. For further reading about the compliance feature of RHAMC, please see Comply to Standards Using Policy-Based Governance of Red Hat Advanced Cluster Management for Kubernetes.
So, what about OPA? First of all, OPA is basically a policy engine for Envoy, Kubernetes, Kafka, and etc to make decisions based on the policy using a policy language called Rego. Besides, when you create API objects such as Pods or Services In Kubernetes, admission controllers intercept requests to the API server before the persistence of an API resource, but only after the request is authenticated and authorized. And admission controllers are plug-ins. For instance, AlwaysPullImages forces every new pod to always pull images so that images are always pulled before running containers. A Guide to Kubernetes Admission Controllers describes why the admission controller secures the workload and how it works for the API requests. Concerning OPA in Kubernetes, OPA leverages admission controllers, and the API Server can be configured to query OPA for admission control decisions when API objects are created, updated, or deleted. For instance, you can create a policy to guarantee that two ingresses in different namespaces must not have the same hostname.
In short, the policy controller in RHACM and OPA seems to have a few overlapped areas, but OPA provides granular policy control for API requests by creating own policies using Rego. The policy controller propagates the API objects with Policy Collection across managed clusters from the hub cluster. If you need to manage several clusters, you can use RHACM to enforce the Compliance Operator or OPA to run all the managed clusters or apply the base configurations using Policy Collection. Last but not least, the Compliance Operator leverages OpenSCAP and is responsible for scanning host OS and the Kubernetes platform using the OpenSCAP content and fixing the configurations that are against the policies. Both aim at securing or hardening the OpenShift or Kubernetes cluster with different angles.