Despite growing adoption, security remains the top concern when it comes to containers and Kubernetes. The good news is that there are many things you can do to make your implementation of containers more secure. This introduction to securing Kubernetes outlines many of the steps you can take to get started with container security.
Building secure images
Container images are often a key source of vulnerabilities that can be introduced into cloud-native environments. A core part of an effective security strategy is to ensure adherence to secure image-building practices across the organization. While image scanning is an important part of the organization’s security posture, securing images is a way to proactively ensure the security of your containerized applications earlier in the application life cycle.
Adopt the following best practices for creating container images securely:
Use minimal base images
The image should contain only the libraries and tools that the container will need to reduce the attack surface.
Use base images from trusted sources
If you’re not building an image from scratch, choose base images that come from a trusted source. You should be able to see the Dockerfile and the source code for all of the image components, and it should be hosted in a reputable registry. Base images should also be updated frequently.
Specify a user
If the Dockerfile doesn’t specify a user, the container will default to executing with the root user, which both expands the potential attack surface and provides an easy path to privilege escalation if the application is compromised.
Verify Docker images
Ensuring image authenticity is a challenge, but it is an important component of building secure images. Any base images should be signed and validated. Only pulling images from trusted registries is a way to ensure all images are authentic.
Scan for vulnerabilities
It is important to find and fix vulnerabilities that exist within container images, including those introduced by open-source libraries.
Keep secrets out
Secrets, which include sensitive data such as credentials and keys, should not be embedded within container images.
Set resource limits
Limiting the amount of CPU or memory resources a container can access helps to restrict the damage it can do if it is used inappropriately.
Limit privileges
Set the container’s privileges to be as restrictive as possible and configure the container to make sure that privileges can not be escalated.
Use multi-stage builds
The build tools used to generate and compile applications can be exploited when they’re run on production systems. Instead of using the same images in the build and run phase, use multi-stage Dockerfiles to strip any unnecessary complication tools out of the runtime images. Debuggers should also be taken out of production images.
Use security-focused coding practices
These can be enforced by using a linter to catch insecure code during the development process.
While these steps are specific to the image-building process, the principles behind building secure images should also inform security practices in other parts of the application. These practices include keeping the attack surface as small as possible, limiting privileges, and tightening configurations rather than using the default settings in Kubernetes and other aspects of the cloud-native stack.
CIS Benchmarks for Kubernetes
The Center for Internet Security (CIS) creates best practices for cyber defense. The CIS uses crowdsourcing to define its security recommendations. The CIS Benchmarks are among its most popular tools.
Organizations can use the CIS Benchmark for Kubernetes to harden their Kubernetes environments. A number of open source and commercial tools are available that automatically check against the settings and controls outlined in the CIS Benchmark to identify insecure configurations.
The CIS Benchmark provides a number of helpful configuration checks, but organizations should consider them a starting point and go beyond the CIS checks to ensure best practices are applied to Kubernetes, including implementing network policies, role-based access control (RBAC) settings, admin privileges, and other protections for the Kubernetes API server.
Configuring Kubernetes Role-Based Access Control (RBAC)
Kubernetes Role-Based Access Control (RBAC) provides the standard method for managing authorization for the Kubernetes API endpoints. Your cluster’s RBAC configuration controls which subjects can execute which verbs on which resource types in which namespaces. For example, a configuration might grant user "alice" access to view resources of type "pod" in the namespace external-api. The RBAC API includes four declarative objects: Role
, ClusterRole
, RoleBinding
, and ClusterRoleBinding
.
Roles
are a namespaced-resource consisting of rules that set permissions for individual namespaces, whereas ClusterRoles
are non-namespaced-resource that grant clusterwide permissions or permissions that span multiple namespaces. Each rule is a combination of verbs, resource types, and namespace selectors.
A role binding is the bridge that ties a user, group of users, or a service account (also known as subjects) to a role and grants those users the permissions defined in that role. A cluster role binding ties a ClusterRole
to all the namespaces in your cluster. In this way, a RoleBinding
assigns permissions within a namespace, whereas a ClusterRoleBinding
grants those permissions clusterwide.
Based on our experience with customers, we have discovered the following five most common mistakes to look for in RBAC configuration settings.
Configuration mistake 1:Cluster administrator role granted unnecessarily
The built-in cluster-admin role grants unlimited access to the cluster. During the transition from the legacy ABAC controller to RBAC, some administrators and users may have replicated ABAC’s permissive configuration by granting cluster-admin widely, neglecting the warnings in the relevant documentation. If users or groups are routinely granted cluster-admin, account compromises or mistakes can have dangerously broad effects. Service accounts typically also do not need this type of access. In both cases, a more tailored Role or Cluster Role should be created and granted only to the specific users that need it.
Configuration mistake 2:Improper use of role aggregation
In Kubernetes 1.9 and later, role aggregation can be used to simplify privilege grants by allowing new privileges to be combined into existing roles. However, if these aggregations are not carefully reviewed, they can change the intended use of a role; for instance, the system:view role could improperly aggregate rules with verbs other than view, violating the intention that subjects granted system:view can never modify the cluster.
Configuration mistake 3:Duplicated role grant
Role definitions may overlap with each other, giving subjects the same access in more than one way. Administrators sometimes intend for this overlap to happen, but this configuration can make it more difficult to understand which subjects are granted which accesses. This situation can make access revocation more difficult if an administrator does not realize that multiple role bindings grant the same privileges.
Configuration mistake 4:Unused role
Roles that are created but not granted to any subject can increase the complexity of RBAC management. Similarly, roles that are granted only to subjects that do not exist (such as service accounts in deleted namespaces or users who have left the organization) can make it difficult to see the configurations that do matter. Removing these unused or inactive roles is typically safe and will focus attention on the active roles.
Configuration mistake 5:Grant of missing roles
Role bindings can reference roles that do not exist. If the same role name is reused for a different purpose in the future, these inactive role bindings can suddenly and unexpectedly grant privileges to subjects other than the ones the new role creator intends.
Properly configuring your cluster RBAC roles and bindings helps minimize the impact of application compromises, user account takeovers, application bugs, or simple human mistakes.
Container and Kubernetes vs. Virtual Machine (VM) security
Virtual machines (VMs) and containers have fundamentally different architectures, though they have just enough similarities to cause some confusion. The differences between containers and VMs have very important security ramifications. Both VMs and containers provide isolation to varying degrees, and both enable portability. Virtual machines are completely self-sufficient, have their own operating system, and do not share resources with other virtual machines. Containers share hosts with other containers, complicating the idea of a secure boundary.
Containers and Kubernetes present a different architectural paradigm that requires a different approach to security. The well-established techniques for host-based security don’t port over to containers. Other security techniques from the host or VM domain, such as building network firewalls around a defined perimeter also don’t apply to containers. In addition, a key part of virtual machine security best practices is applying security patches, but patches cannot be applied to a running container — instead, one should update the container image and rebuild the container.
Securing containers requires:
- Controlling connections between containers
- Ensuring containers are free of known vulnerabilities
- Preventing containers from having root access
- Restricting permissions and access to only what’s needed for the application to function
Kubernetes adds an additional layer of complexity, and it introduces additional potential security risks. Managing Kubernetes configurations and networking policies is crucial to a strong security posture for containerized applications.
In addition, the workflow changes that come with moving to containerized applications makes it important to integrate security throughout the entire life cycle. Security has to be baked-in to the application from the start, beginning with how images and containers are configured. You can’t add security to a containerized application at the end of the development process, right before deployment.
Security in containerized applications requires controlling the source of all components including open source elements, managing configurations, scanning images, and enabling granular role-based access controls. Containers and Kubernetes force a different approach to security, but given their declarative and immutable nature, they do present the opportunity—when properly configured—to build the most secure applications ever created.
Container runtime configuration
In Kubernetes, containers run inside pods, one of several Kubernetes Objects, and each Pod’s runtime configuration can be set and enforced using a combination of security context in the Pod specification, Pod Security Policies (PSPs), or an admission controller like the Open Policy Agent (OPA) Gatekeeper.
Security Context is defined in the deployment manifest and allows you to define the exact requirement for each workload. It can be configured for a Pod or Container. Pod Security Policies are a cluster-level Kubernetes resource that control the security context Pods can run with. If PSPs are enabled for a cluster, any attempt to create a Pod which does not adhere to its associated PSP will be rejected by the PSP admission controller.
Limit container runtime privileges:
- Do not run application processes as root. Set
runAsUser
toMustRunAsNonRoot
- Do not allow privilege escalation. Set
allowPrivilegeEscalation
tofalse
- Use a read-only root filesystem. Set
readOnlyRootFilesystem
totrue
- Use the default (masked)
/proc
filesystem mount - Do not use the host network or process space. Set
hostPID
,hostNetwork
, andhostIPC
tofalse
- Drop unused Linux capabilities and do not add optional capabilities that your application does not absolutely require
- Use SELinux options for more fine-grained process controls
- Give each application its own Kubernetes Service Account
- Do not mount the service account credentials in a container if it does not need to access the Kubernetes API
Use Kubernetes namespaces
Kubernetes namespaces provide scoping for cluster objects, allowing fine-grained cluster object management. Containers/Pods, services, and deployments within a namespace can be isolated using controls like Kubernetes network policies or have their access restricted using Kubernetes Role-Based Access Control (RBAC).
Plan out how you want to assign namespaces before you start deploying workloads to your clusters. Having one namespace per application provides the best opportunity for control, although it does incur additional management overhead when assigning RBAC role privileges and default network policies. If you do decide to group more than one application into a namespace, the main criteria should be whether those applications have common RBAC requirements and whether it would be safe to grant those privileges to the service accounts and users which need Kubernetes API access in that namespace.
Kubernetes configuration management
Broadly speaking, configuration management is the engineering practice of establishing policies regarding configurations and ensuring that those policies are applied consistently across the entire organization, throughout the application life cycle. Configurations are a critical part of managing security risk in cloud-native applications, especially because many default configurations for containers and Kubernetes are not secure. Misconfigurations are the most common source of security risk in containerized applications running on Kubernetes.
Configuration management must be automated, with guardrails managed centrally so that individual developers or operators are not responsible for manually configuring workloads. These guardrails should be based on organizational security policies.
According to IBM, 95% of cloud security failures are caused by human error. As applications become increasingly complicated, running on distributed systems in containers and Kubernetes, the risk of misconfigurations expands. In the absence of a centralized configuration management tool, it is nearly impossible for organizations to ensure that configuration policies are consistently applied. For companies with a multi-cloud or hybrid cloud setup, getting configuration right consistently is even more challenging, because each environment requires a different set of configurations. There’s also a persistent skills gap among developers and operators who aren’t always aware of best practices for secure configuration.
In many cases the easiest way for developers to set configurations for their applications to run is also the least secure, such as allowing root access, giving admin privileges or setting very high resource limits. With the right tools, configuration management can be integrated into the DevOps workflow so it doesn’t slow down development velocity. Doing so is a best practice, because it eliminates the tension between releasing quickly and securing the workload’s configurations.
Configuration management should include a way to both get visibility into configurations as well as put guardrails on what configurations are allowed, so that insecure builds or risky deployments can be failed automatically. Organizations need to get a single pane of glass to see all relevant configurations across containers and Kubernetes and be alerted to potentially risky configurations.
The cornerstones of configuration management for containers and Kubernetes are the following:
- Role-based access controls (RBAC). Organizations need to find overly permissive configurations and/or unnecessary roles.
- Secrets. A good configuration management tool can proactively limit access to secrets.
- Policy-based assessments. Setting organizational security policies is a crucial part of any security posture, and there should be a way to check deployments against those pre-determined policies.
- Privileges. Privileges should be assigned based on least-privileged-access principles.
- Resource limits. Both containers and Kubernetes clusters should have limits on the CPU and memory available.
- Network policies. Network policies should limit the communication between parts of the application as much as possible to limit potential damage if a container is compromised.
The simplest way to start with configuration management is to follow industry-accepted best practices like the CIS Benchmarks. As the organization’s adoption of containers advances, creating organizational governance policies around configuration management is a best practice. Configuration management should cover both configurations for containers and Kubernetes, as configurations have to be managed appropriately in both places to ensure a strong security posture.
Network segmentation
By default, Kubernetes allows all pods within a cluster to communicate freely. This makes application operations easier, but also creates a security risk. Although the defaults are overly permissive, Kubernetes also has built-in enforcement capabilities that can be configured to restrict communication between assets. Network segmentation is a part of restricting communication between parts of the deployment. Network segmentation is also required by some compliance frameworks, including PCI-DSS.
Network segmentation works by breaking networks into smaller subnetworks. From a security perspective, the primary advantage is that if a malicious actor gains access to one application running on the same Kubernetes cluster with other applications, network segmentation prevents that malicious actor from accessing all of the apps on the cluster. It’s also a way to isolate sensitive workloads and/or workloads that are in-scope for a particular compliance framework from other parts of the application.
Network policies should be as restrictive as possible, allowing individual containers to communicate with only the containers that are necessary for the application to function as designed.
In Kubernetes, network segmentation is done by enforcing network policies, both through Kubernetes native network enforcement capabilities as well as through using additional infrastructure layers like a service mesh.
By default, there are no restrictions on communication between pods, containers and nodes, either within the same namespace or between namespaces. Putting network policies in place to restrict communication, generally starting with a policy that denies all communication, is a good best practice starting point. Because pods do need to communicate with each other, it’s best to then systematically list the other pods that a given pod needs to communicate with. Ingress from and egress to the public internet should also be allowed on an allowed list basis, for only the pods that need it.
There is also some operational risk associated with changing network policies and increasing network segmentation. Using a tool to visualize how changes to system-wide network policies would impact the application can help minimize the risk of unexpected consequences from adjusting network policies.
Risk profiling
No organization will ever have a perfectly secure application or IT infrastructure. Security requires prioritizing and understanding the risks and tradeoffs associated with different actions. Risk profiling is the process of outlining the organization’s known security risks and its policies and practices related to managing that risk. Every organization must accept some level of risk, but should be clear about how much risk is acceptable. Risk profiling should be done not only for the organization as a whole, but for individual applications. Sensitive workloads, or workloads that are in scope for compliance requirements, have a different risk profile than non-sensitive workloads.
Risk profiling also helps assess the significance of vulnerabilities that exist within the environment. Responding to every vulnerability would be impossible, so a strong security posture requires evaluating the risk of every vulnerability in order to prioritize remediation correctly.
In a distributed, containerized application, it can be difficult to understand and prioritize an application’s risk profile. There might be hundreds of vulnerabilities in any potential application, but all vulnerabilities do not have the same risk. Security risk from a vulnerability depends on factors such as:
- The severity of the vulnerability
- Whether or not the application is public-facing
- If the application is in production
- Whether the application is in scope for compliance regulations
- Whether or not the application accesses sensitive data
- The container’s privilege level
- The container’s network exposure
While organizations should define ahead of time what level of risk is acceptable, often by establishing internal policies about how quickly vulnerabilities at each severity level must be fixed, risk profiling is not a static exercise. The process of evaluating security risks, particularly in the context of a containerized application, has to happen continually during runtime.
Manually triaging potential security incidents, vulnerabilities and policies is a recipe for error and burnout. Especially at scale, risk profiling often simply isn’t possible to do without relying on automated tools to uncover and prioritize security risks. Successful risk profiling in Kubernetes should make use of Kubernetes’ declarative, contextual data to automate the prioritization process. This allows security teams to focus on fixing the highest-risk deployments first instead of spending time on the risk profiling process.
Ideally, risk profiling can be used as both a reactive and proactive tool. When risks are found and fixed in one deployment, that information can be used to find other deployments with similar risk factors and proactively address the potential security risks ahead of time.
Runtime detection and response
Runtime security is a critical line of defense against malicious actors. Ideally any unpatched vulnerabilities, insecure configurations, or insecure credentials would be caught at the build or deployment stage. In reality, though, runtime detection and response is essential because sometimes vulnerabilities slip through these earlier phases, and because new vulnerabilities are continually being discovered. It’s also important for compliance reasons and as a line of defense against internal threats.
Declarative, immutable workloads require an entirely new model for detecting and responding to potential security incidents in runtime. The fact that containers generally run a minimal number of processes, combined with the declarative nature of Kubernetes, actually makes some aspects of runtime security easier than in virtual machine (VM)-based applications. On the other hand, running containers should not be ‘patched’ the same way security patches would be applied to a VM-based app; instead they should be treated as immutable and be killed, updated, and restarted.
Detection is the cornerstone of runtime security. This involves finding a baseline for how the application behaves and investigating any activity that deviates too far from the baseline. Some of the activities that might be tracked include network requests and process executions. When those activities deviate from what is expected it could be a sign of potentially suspicious or malicious activity. For example, trying to connect to the Internet when that isn’t allowed. Regardless, that type of anomalous behavior would point to something that needs to be addressed.
Anomaly detection can be more accurate in containers than it is for VM-based workloads, because containers only contain one application, making it easier to isolate what is and is not baseline behavior for a container. Anomaly detection, however, should also always be connected to an incident response process.
Depending on the type of anomalous behavior, the best course of action might be to respond automatically, by having the platform kill the impacted pods or containers. In other cases, it might be more appropriate to send an alert and evaluate the behavior manually. However, potential incident response should be as automated as possible to minimize response times and increase overall security of containerized applications.
Secure the kubelet
The kubelet is the main "node agent" running on each node. Misconfiguring it can expose you to a host of security risks. You can either use arguments on the running kubelet executable or a kubelet config file to set the configuration of your kubelet.
To find the kubelet config file, run the following command:
ps -ef | grep kubelet | grep config
Look for --config
argument, which will give you the location of the kubelet config file.
Then run the following command on each node:
ps -ef | grep kubelet
In the output, make sure that the:
--anonymous-auth
argument is false
. In the kubelet article previously referenced, one of the misconfigurations exploited was where anonymous (and unauthenticated) requests were allowed to be served by the kubelet server.
--authorization-mode
argument shows as AlwaysAllow
if it’s there. If it is not there, make sure there’s a kubelet config file specified by --config
and that file has set authorization: mode to something besides AlwaysAllow
.
--client-ca-file
argument is there and set to the location of the client certificate authority file. If it’s not there, make sure there’s a kubelet config file specified by --config
and that file has set authentication: x509: clientCAFile
to the location of the client certificate authority file.
--read-only-port
argument is there and set to 0
. If it’s not there, make sure there’s a kubelet config file specified by --config
, and readOnlyPort
is set to 0
if it’s there.
--protect-kernel-defaults
shows as true
. If it’s not there, make sure there’s a kubelet config file specified by --config
, and that file has set protectKernelDefaults
as true
.
--hostname-override
argument is not there, to ensure that the TLS setup between the kubelet and the API Server doesn’t break.
--event-qps
argument is there and set to 0
. If it’s not there, make sure there’s a kubelet config file specified by --config
and eventRecordQPS
shows as 0
.
--tls-cert-file
and --tls-private-key-file
arguments are set appropriately or the kubelet config specified by --config
contains appropriate settings for tlsCertFile
and tlsPrivateKeyFile
. This configuration ensures that all connections happen over TLS on the kubelets.
RotateKubeletServerCertificate
and --rotate-certificates
is set to true
if your kubelets get their certs from the API Server, and make sure your kubelet uses only strong crypto ciphers
Securing the Kubernetes API server
The Kubernetes API server handles the REST API calls from users or applications running within the cluster to enable cluster management. Considered the gateway to the Kubernetes control plane, you can access the API server using kubectl, client libraries, or by making API requests directly. One way to manage authorization for the Kubernetes API server is using Kubernetes Role-Based Access Control (RBAC). You can also validate requests to the API server using admission controllers.
Protecting the API server starts with controlling its access. The Center for Internet Security (CIS) provides configuration best practices to harden and secure the API server.
Run the below command on your master node:
ps -ef | grep kube-apiserver
In the output, check to ensure that the:
--anonymous-auth
argument shows as false
. This setting ensures that requests not rejected by other authentication methods are not treated as anonymous and therefore allowed against policy.
--basic-auth-file
argument isn’t there. Basic auth uses plaintext credentials, instead of the preferred tokens or certificates, for authentication.
--insecure-allow-any-token
argument isn’t there. This setting will ensure that only secure tokens that are authenticated are allowed.
–kubelet-https
argument either isn’t there or shows as true
. This configuration ensures that connections between the API server and the kubelets are protected in transit via Transport Layer Security (TLS).
--insecure-bind-address
argument isn’t there. This configuration will prevent the API server from binding to an insecure address, preventing non-authenticated and unencrypted access to your master node, which minimizes your risk of attackers potentially reading sensitive data in transit.
--insecure-port
argument shows as 0
. This setting will prevent the API server from serving on an insecure port, which would prevent unauthenticated and unencrypted access to the master node and minimize the risk of an attacker taking control of the cluster.
--secure-port
argument either doesn’t exist or shows up as an integer between 1 and 65535. The goal here is to make sure all your traffic is served over https with authentication and authorization.
--profiling
argument shows as false
. Unless you’re experiencing bottlenecks or need to troubleshoot something, there’s no need for the profiler, and having it there unnecessarily opens you to exposure of system and program details.
--repair-malformed-updates
argument shows as false
. This setting will ensure that intentionally malformed requests from clients are rejected by the API server.
--enable-admission-plugins
argument is set with a value that doesn’t contain AlwaysAdmit
. If you configure this setting to always admit, then it will admit requests even if they’re not explicitly allowed by the admissions control plugin, which would decrease the plugin’s effectiveness.
--enable-admission-plugins
argument is set with a value that contains AlwaysPullImages
. This configuration ensures that users aren’t allowed to pull images from the node to any pod by simply knowing the name of the image. With this control enabled, images will always be pulled prior to starting a container, which will require valid credentials.
--enable-admission-plugins
argument is set with a value that contains SecurityContextDeny
. This control ensures that you can’t customize pod-level security context in a way not outlined in the Pod Security Policy.
--disable-admission-plugins
argument is set with a value that does not contain NamespaceLifecycle
. You don’t want to disable this control, because it ensures that objects aren’t created in non-existent namespaces or in those namespaces set to be terminated.
--audit-log-path
argument is set to an appropriate path where you want your audit logs to be stored. It’s always a good security practice to enable auditing for any Kubernetes components, when available, including the Kubernetes API server.
--audit-log-maxage
argument is set to 30
or whatever number of days you must store your audit log files to comply with internal and external data retention policies.
--audit-log-maxbackup
argument is set to 10
or any number that helps you meet your compliance requirements for retaining the number of old log files.
--audit-log-maxsize
argument is set to 100
or whatever number that helps you meet your compliance requirements. Note that number 100 represents 100 MB.
--authorization-mode
argument is there and is not set to AlwaysAllow
. This setting ensures that only authorized requests are allowed by the API server, especially in production clusters.
--token-auth-file
argument is not there. This argument, when present, uses static token-based authentication, which have several security flaws; use alternate authentication methods instead, such as certificates.
--kubelet-certificate-authority
argument is there. This setting helps prevent a man-in-the-middle attack when there’s a connection between the API Server and the kubelet.
--kubelet-client-certificate
and --kubelet-client-key
arguments are there. This configuration ensures that the API Server authenticates itself to the kubelet’s HTTPS endpoints. (By default, the API server doesn’t take this step.)
--service-account-lookup
argument is there and set to true
. This setting helps prevent an instance where the API server verifies only the validity of the authentication token without ensuring that the service account token included in the request is present in etcd.
--enable-admission-plugins
argument is set to a value that contains PodSecurityPolicy
.
--service-account-key-file
argument is there and is set to a separate public/private key pair for signing service account tokens. If you don’t specify a public/private key pair, it will use the private key from the TLS serving certificate, which would inhibit your ability to rotate the keys for service account tokens.
--etcd-certfile
and --etcd-keyfile
arguments are there so that the API server identifies itself to the etcd server using client cert and key. Note that etcd stores objects that are likely sensitive in nature, so any client connections must use TLS encryption.
--disable-admission-plugins
argument is set and doesn’t contain ServiceAccount
. This configuration will make sure that when a new pod is created, it will not use a default service account within the same namespace.
--tls-cert-file
and --tls-private-key-file
arguments are there such that the API Server serves only HTTPS traffic via TLS.
--client-ca-file
argument exists to ensure that TLS and client cert authentication is configured for Kube cluster deployments.
--etcd-cafile
argument exists and it is set such that the API server must verify itself to the etcd server via SSL Certificate Authority file.
--tls-cipher-suites
argument is set in a way that uses strong crypto ciphers.
--authorization-mode argument
is there with a value containing Node
. This configuration limits which objects kubelets can read associated with their nodes.
--enable-admission-plugins argument
is set and contains the value NodeRestriction
. This plugin ensures that a kubelet is allowed to modify only its own Node API object and those Pod API objects associated to its node.
--encryption-provider-config
argument is set to a EncryptionConfig
file and this file should have all the needed resources. This setting ensures that all the REST API objects stored in the etcd key-value store are encrypted at rest.
Make sure aescbc
encryption provider is utilized for all desired resources as this provider of encryption is considered the strongest.
--enable-admission-plugins
argument contains the value EventRateLimit
to set a limit on the number of events accepted by the API server for performance optimization of the cluster.
--feature-gates
argument is not set with a value containing AdvancedAuditing=false
. In other words, make sure advanced auditing is not disabled for auditing and investigation purposes.
--request-timeout
argument is either not set or set to an appropriate value (neither too short, nor too long). Default value is 60 seconds.
--authorization-mode
argument exists and is set to a value that includes Kubernetes RBAC.
This setting ensures that RBAC is turned on. Beyond simply turning it on, you should follow several other recommendations for how to best use RBAC, including:
- Avoid giving users cluster-admin role because it gives very broad powers over the environment and should be used very sparingly, if at all.
- Audit your role aggregation rules to ensure you’re using them properly.
- Don’t grant duplicated permissions to subjects because it can make access revocation more difficult.
- Regularly remove unused roles.
The security challenge with default settings
One of the biggest risks with containers and Kubernetes is that neither technologies’ default configurations are secure. Reducing risk of a security incident requires proactively changing those default configurations consistently across the entire organization. Neglecting this step, through oversight, lack of knowledge, or workflows that don’t incorporate a configuration management step, will lead to workloads that are unnecessarily vulnerable.
Default settings in containers
Many default container settings, as well as common practices when building containers, can leave the containers vulnerable. Here are some things to look out for when configuring containers.
- Specify a user. If a user is not specified, it will default to the root user, potentially giving the container root access to the host.
- Verify images. Default settings do not enforce image verification, leading to the potential to pull compromised images unknowingly.
- Set resource limits. Resource limits can be configured, but there are no default limits. Limiting the CPU and memory a container can consume helps prevent the container from consuming large amounts of resources if it becomes compromised.
- Install tools and libraries selectively. The fewer tools and libraries are in the containers, the fewer tools a malicious actor has to exploit if they get access to the container. Make sure you don’t just install a standard set of tools in every container, but rather install only what’s actually needed.
- Control access to the registry by allowing listing. In addition to making sure any images you use are from trusted sources, access to the registry has to be tightly controlled, ideally through allow listing only trusted users.
Default settings in Kubernetes
Kubernetes also provides many tools for improving the organization’s security posture, but they have to be actively configured to provide security benefits. Here are some core things to check in Kubernetes.
- Configure Role-Based Access Controls (RBAC). RBAC is enabled by default in Kubernetes 1.6 and later, but still needs to be configured correctly to provide benefits. Ideally, access should be provided by namespace, not clusters.
- Use namespaces. The ‘default’ is to run everything in the same namespace. Actively use the separation provided by namespaces to isolate workloads from each other.
- Use Kubernetes Network Policies. There are no network policies configured out-of-the-box, so organizations need to install a network plugin to control ingress and egress traffic from the application and configure policies accordingly.
- Enable Audit Logging. Audit logging is not usually enabled by default, but should be turned on to get visibility into anomalous API calls and authorization failures.
The risk of doing nothing
Usually development teams without deep security expertise are the first to use containers and Kubernetes. Ignoring security, however, exposes organizations to risk regardless of the type of infrastructure they use.
As we mentioned, but it should be repeated, configurations in both containers and Kubernetes are not secure by default. Kubernetes’ native security functionality must be actively configured to provide any security benefits. The fastest, easiest way to get an application live is often to give it too many privileges and/or to set resource limits far higher than the application needs—or to leave defaults as is. Although organizations can be tempted to neglect security at first, particularly as they familiarize themselves with containers and Kubernetes, doing so puts them at major risk. Malicious actors are able to exploit containerized applications on Kubernetes just as easily as they can exploit legacy applications.
The core risk of doing nothing is that the application will be compromised by malicious actors. How catastrophic that would be depends on the organization’s industry, the type of application in use, and the scope and type of breach. The likelihood that neglecting security will result in an incident also depends on factors such as whether the application is Internet facing.
Doing nothing to ensure your containerized applications are secure risks making your applications unnecessarily vulnerable. Ignoring security leads to:
Unpatched security vulnerabilities. New security vulnerabilities are discovered all the time, and some of them have been serious. Both the open-source community and malicious actors will know about these vulnerabilities as soon as they are published - failing to address them quickly is risky.
Lax permissions. While role-based access control (RBAC) is enabled by default in Kubernetes, it’s up to the developer to put the access controls in place. Without security guidance, developers will often create workloads that have too much access.
Non-isolated workloads. By default, everything can run in a single default namespace. Using namespaces to start isolating workloads is a basic security best practice. Without conscious attention to this step of security, this level of isolation will not happen.
No control over network policies. Kubernetes Network Policies can help organizations control traffic, but they require a network plugin and for the policies to be configured.
Waiting to apply security controls will ultimately lead to disjointed security practices, a security review bottleneck that reduces development speed and increases risk of security incidents. Instead, help organizations apply security controls early and often in the software development life cycle.
Top container and Kubernetes security best practices
As the de facto container orchestration system, Kubernetes makes container management possible but also introduces potential security vulnerabilities into infrastructure environments. The differences between security with containers and Kubernetes and virtual machines and a continued skills gap with regards to Kubernetes security can lead to unnecessary security risks.
Security best practices for Kubernetes, like all security best practices, include both best practices to make the application and infrastructure more secure as well as organizational and cultural practices to get centralized control over security.
Creating and enforcing organizational security policies is a best practice regardless of the tech stack the company relies on. Security is a process of risk management, and tools cannot be relied on to decide how much risk is acceptable for each application, for example. This kind of decision has to be made by humans who can take into account how much risk is acceptable for the organization in general, for individual business units, and for each application.
Central control of security. Related to the first point, organizations need a way to ensure that the security and governance policies it has set are being followed. Central teams need to have visibility into configurations and vulnerabilities throughout the entire distributed application, and should have a way to easily visualize and prioritize potential problems. In addition, they need to be able to create guardrails so that individuals get instant feedback when a risky configuration, insecure image, or other potential security risk is part of a build.
Partner with security earlier. ‘Shifting left’ and partnering with security earlier in the development process not only helps remove the security review bottleneck and helps get applications out the door quicker, but also decreases the likelihood of errors resulting in a vulnerability or misconfiguration being exploited.
Leverage automation. Particularly as the Kubernetes footprint expands to multiple clusters and hundreds of namespaces, managing configurations or monitoring runtime behavior manually is no longer possible.
There are also some very important technical best practices specific to making Kubernetes as secure as possible.
- Keep Kubernetes up to date. Because security patches are not always released for older versions, it’s a good idea to run a newer, supported release.
- Use role-based access control. Access should always be configured on a least-privilege access basis.
- Limit communications between pods. Limits should be as restrictive as possible for the pods to function as designed.
- Use network segmentation. Each pod should be able to communicate only with the internal or external resources it needs to and remain isolated from all other resources.
Vulnerability management best practices
Vulnerability management is a critical component of keeping applications secure. It is the process of identifying, assessing, and fixing security vulnerabilities at all stages of the software development lifecycle. Vulnerability management in containerized, cloud-native applications needs to be automated and integrated into the DevOps processes of building and shipping applications. The environment is too complex to manage vulnerabilities manually and, in the real world, if it slows down the development speed too much organizations will be tempted to skip security safeguards.
Vulnerability management is not a gate that the application has to pass through, but rather a continuous process that starts with image scanning and introspection at the build stage and continues throughout the application’s lifecycle, in test and production environments.
Image scanning and implementing policies regarding image vulnerabilities during the build phase are the first steps towards effective container-native vulnerability management. The ability to run scans on demand, as images are built, or once containers are running is important to be able to spot vulnerabilities that may have been exposed during runtime. Vulnerability management has to be able to spot exposure in both containers as well as in Kubernetes, as both can be the source of vulnerabilities.
There is no such thing as a completely secure application, and good vulnerability management allows teams to not only see vulnerabilities but also additional information to help prioritize the organization-specific criticality of a given vulnerability. For example, even a high-priority CVE has a different risk profile depending on the sensitivity of the workload. Good vulnerability management is about being able to balance, evaluate, and prioritize fixes to establish the best possible security posture.
Vulnerability management should be primarily automated in cloud-native applications. Human intelligence is needed to define policies, but tooling should be responsible for finding policy violations and taking appropriate action based on the vulnerability, risk level and the part of the life cycle, from automatically failing builds to blocking deployments or scaling them to zero in production.
Vulnerability scanning—images, in running deployments
Image and vulnerability scanning should start during the build phase but has to continue throughout the entire application lifecycle, including in runtime. New security vulnerabilities can be discovered at any time, and the ability to detect any vulnerabilities in running deployments is critical to the organization’s security posture. Vulnerabilities in running deployments could result in an immediate security risk and organizations need a way to detect and remediate them as soon as possible.
At the build phase, non-compliant images, including those with severe and fixable vulnerabilities, should fail to build. DevOps teams should get that feedback directly in the CI system. At deploy time, security tooling can apply admission control to automatically prevent containers with known vulnerabilities detected in the image from being deployed. It’s crucial to know how to prioritize remediation, depending on vulnerability severity, the sensitivity of the workload, and the organization’s general tolerance of security risk. Organizations should take the time to create customized policies and implement tools that allow those policies to be enforced - at build time and deploy time - through automation. And after deployments are running, organizations should still continue to scan for vulnerabilities.
Different capabilities in image scanners
Not all image scanners provide the same level of comprehensive checks: Some scan only the underlying operating system, others also scan libraries, others do language-level checks, and others scan file contents. It’s important to choose an image scanner that is at least as comprehensive as the organization needs, as well as one that is compatible with the programming languages used by your applications.
Some image scanners perform a real-time scan upon each image pull, but this approach can increase latency, so organizations have to decide whether the real-time information is worth the performance hit.
Scanning in runtime
As with image scanning during the build phase, not all detected vulnerabilities merit the same response. Organizations need a way to prioritize remediation focus based on workload sensitivity, data sensitivity, Internet exposure, along with the severity of the detected vulnerabilities. No two organizations will have the same procedures or service level objectives to guide appropriate response to discovered vulnerabilities. There are trade-offs associated with, for example, blocking every container with discovered vulnerabilities, regardless of the severity or sensitivity. Successful vulnerability scanning in running deployments requires both the right tools to ensure the right visibility and information as well as thoughtful organizational security policies that hit the right balance between vulnerability management and operational impact.
Zero-trust networks in Kubernetes, cloud-native applications
The complexity of corporate networks and the distributed applications in them have evolved, and the threat models and the methods leveraged to deepen infiltration have followed suit. Secure perimeters can only serve as a first-line defense to protect internal networks, not a comprehensive strategy for the protection of infrastructure and data. Robust security requires a combination of controls and strategies.
Zero-trust networking can serve as an important piece of the security puzzle by focusing on increasing the safety of internal application traffic. This model overturns the long-held tenet that all traffic inside a firewalled network is authentic with the opposite assumption: no network connections should be considered safe until they prove otherwise.
Traditionally, network administrators worked under the assumption that every entity, whether application, server, or piece of networking software or hardware, found in their internal networks belonged there and could be trusted. Some applications did not require authentication of client connections or relied on static, shared credentials, e.g. a password for a database. All applications had to handle any authentication or authorization schemes they needed, if they used any at all. Often, internal network connections, even those for sensitive services, did not use any encryption.
More than a few corporate networks still follow this pattern. However, a single bad actor that can be placed inside this lax environment, whether through a direct hack, a trojan horse applied accidentally by an authorized individual, or simply a hole in a network firewall, can wreak havoc by taking full advantage of this implicit network of trust. The possibilities may not be endless, but they are predictable. From sniffing of plaintext network packets to discovering application passwords to databases or other critical systems all the way to gaining control of network equipment, this scenario opens the door to unacceptable risks, including data exfiltration or loss.
Zero-trust forms the basis of a growing number of security-first production infrastructures. Instead of assuming every entity on a network can be trusted without verification, it assumes nothing can, not even the network infrastructure itself. The zero-trust framework does not offer a prescriptive implementation or specific set of technologies to use. Rather, it describes a set of principles and goals, leaving the specific technical details of implementation to each organization.
Zero-trust architectures
Zero-trust architectures generally follow these principles:
- Security controls should apply equally to all entities, whether software or hardware, regardless of their network location.
- Network connections should be authenticated at both ends, by the server and the client. Client authentication by the server is generally expected now, but clients should also verify that they have connected to a valid server. Connections should be re-authenticated and requests should be reauthorized as needed when they span more than a single transaction.
- Authorization grants should follow the principle of least privilege, allowing only the bare minimum permissions required for a client’s workload.
- All network connections and transactions should be subject to continuous monitoring for analysis.
Implementing Zero-trust model in Kubernetes
What would a zero-trust model look like in a Kubernetes cluster? While no single methodology for implementing zero-trust principles within a Kubernetes cluster exists, service meshes have emerged as a popular solution for many of the architecture’s goals.
Service meshes create a virtualized network layer to connect and control distributed application services. While most service mesh solutions initially did not focus on network security, but rather on facilitating and managing intelligent service discovery and request routing, the most popular open-source projects now offer features that fit in a zero-trust architecture. As many service meshes attempt to create an overlay that does not require modification of individual applications, they eliminate much of the burden of making significant changes to enable strict authentication and authorization controls.
Service meshes that support Kubernetes typically use decentralized, point-to-point routing by giving each individual cluster pod its own proxy instance. These proxies can manage client TLS certificates, which the proxy can use to prove its identity when making connections to other services or receiving connections from other clients. This use of TLS certificates to provide proof of identity on both the client and server sides is called mutual Transport Layer Security (mTLS). mTLS, besides performing connection authentication, also serves to encrypt the network connection. In addition to authentication and encryption over the wire, different service meshes support different authorization sources, ranging from static lists to integrations with third-party single sign-on or other services.
Service meshes do not provide a complete zero-trust solution for Kubernetes clusters, but they do offer a number of its core benefits. Even if you cannot achieve a perfect zero-trust architecture in your Kubernetes clusters, any incremental changes you make in that direction will help to protect your cluster and its workloads.
How Red Hat can help
Securing cloud-native applications and the underlying infrastructure requires significant changes to an organization’s security approach—organizations must apply controls earlier in the application development life cycle, use built-in controls to enforce policies that prevent operational and scalability issues, and keep up with increasingly rapid release schedules.
Red Hat® Advanced Cluster Security for Kubernetes is a Kubernetes-native security platform that equips organizations to more securely build, deploy, and run cloud-native applications anywhere and accelerate innovation with confidence. The solution helps improve the security of the application build process, protect the application platform and configurations, and detect and respond to runtime issues.