Introduction
OpenShift comes by default with a static Grafana dashboard, which will present cluster metrics to cluster administrators. It is not possible to customize this Grafana instance.
However, many customers would like to create their own dashboards, their own monitoring, and their own alerting while maximizing OpenShift without installing a separated monitoring stack.
So how can you create your own queries? How can you visualize them on custom dashboards, without the need to install Prometheus or AlertManager a second time?
The solution is simple: Since OpenShift 4.5 (as TechPreview) and OpenShift 4.6 (as GA) are the default monitoring stacks, OpenShift has been extended to support monitoring of user-defined projects. This additional configuration will help on your own projects.
In this article, we will see how to deploy the Grafana Operator and what issues can occur when connecting Grafana to OpenShift monitoring.
Overview
As a developer in OpenShift, you can create an application that provides custom statistics of your application at the endpoint /metrics. Here an example from the official OpenShift documentation:
# HELP http_requests_total Count of all HTTP requests |
This metric can then be viewed inside OpenShift in the developer view under the menu Monitoring. If you go to "Monitoring > Metrics" and select "Custom Query" from the drop-down, you can enter, for example, the following PromQL query:
sum(rate(http_requests_total[2m])) |
The following graph will be the result:
Figure 1. Custom Query
This is great! But … what happens if a customer would like to see his very own super-fancy Grafana dashboard? You cannot change the cluster dashboard. However, you can install your own Grafana instance, and one way to do so is by using the Custom Grafana Operator.
Architecture
The following image depicts a brief overview of the user-defined workload monitoring, Grafana Operator, and an example application:
- Openshift-monitoring: This Is the default cluster monitoring, which will always be installed along with the cluster. It provides Prometheus and Thanos Querier and (not in the picture) a Grafana dashboard, which shows cluster metrics.
- Openshift-user-workload-monitoring: This is responsible for customer workload monitoring. It deploys its own instance of Prometheus, which is queried by the Thanos Querier. This instance will scrape custom monitoring metrics based on so-called ServiceMonitor objects which are defined in no. 4.
- Grafana-operator namespace: This is the namespace where the Grafana-Operator will be deployed. It holds several custom resources, like GrafanaDashboard and GrafanaDatasource. Grafana used the Thanos Querier to get and visualize the appropriate metric.
- An example namespace (ns1): This provides an example application and a ServiceMonitor object. The ServiceMonitor will be used to scrape the metrics.
Before We begin
The following should be prepared before you get started :
1: OpenShift 4.6 (4.5 is also possible. However, user-defined workload monitoring is only available as TechPreview there.)
2: Enabled user-defined workload monitoring.
A: Create and apply the following manifest for the cluster monitoring:
apiVersion: v1 |
B: Create and apply the following manifest for the user-defined monitoring:
apiVersion: v1 |
3: Create an example project with user-defined workload monitoring. In this use case, we create the namespace “ns1” with an application that provides example metrics:
apiVersion: v1 |
NOTE: Further information can be found in the official documentation at https://docs.openshift.com/container-platform/4.6/monitoring/enabling-monitoring-for-user-defined-projects.html.
Deploy Custom Grafana Operator
As for any community operator, the following must be considered:
Community Operators are operators that have not been vetted or verified by Red Hat. Community Operators should be used with caution because their stability is unknown. Red Hat provides no support for Community Operators. |
The community Grafana Operator must be deployed to its own namespace, for example grafana. Create this namespace first (oc new-project grafana) and search and install the Grafana Operator from the OperatorHub.
You can use the default values;, just be sure to select the wanted namespace.
After a few minutes, the operator should be available:
Figure 2. Installed Community Grafana Operator
Setup Grafana Operator
Before we can use Grafana to draw beautiful images it must be configured. We need to create an instance of Grafana. Ideally, OpenShift OAuth is already leveraged, to avoid having to create a user account manually, inside Grafana.
OAuth requires some objects, which must be created before the actual Grafana instance. The following YAMLs are taken from the operator documentation.
Create the following inside the Grafana namespace:
- Session secret for the proxy … change the password!!
- a cluster role grafana-proxy
- a cluster role binding for the role
- a config map injecting trusted CA bundles
apiVersion: v1 |
Note: Be sure to use your selected namespace. In this example namespace, “grafana” is used.
Now you can create the following instance under:
"Installed Operators > Grafana Operator > Grafana > Create Grafana > YAML View" (or, as an alternative, via the CLI)
apiVersion: integreatly.org/v1alpha1 |
<1> |
Some default settings, which can be modified if required |
<2> |
A default administrative user |
<3> |
A datastore to use a persistent volume. Other options would be to use ephemeral storage or another database. This might be especially important if you would like HA for your Grafana. |
<4> |
Container arguments, which are important for the openshift-sar line which, in turn, affects the OAuth |
<5> |
Be sure to use your select namespace. |
After a few moments, the operator picks up the change and creates a Grafana pod.
Adding a Data Source
The next step is to connect your custom Grafana to Prometheus, or rather to the Thanos Querier. To do so, you add a role to the Grafana service account and create a CRD GrafanaDataSource.
At this moment, we will work with the cluster role cluster-monitoring-view. However, the problem that can result is discussed later.
1: Add the role to the Grafana serviceaccount:
oc adm policy add-cluster-role-to-user cluster-monitoring-view -z grafana-serviceaccount |
export GRAFANA_NAMESPACE=grafana export BEARER_TOKEN=$(oc sa get-token grafana-serviceaccount -n $GRAFANA_NAMESPACE) |
3: Prepare the following yaml file as grafana-datasource.yaml:
apiVersion: integreatly.org/v1alpha1 |
- Note: Thanos default querier URL … this might cause problems (see below)
envsubst < grafana-datasource.yaml | oc -n $GRAFANA_NAMESPACE apply -f - |
The operator will now restart the Grafana pod to add the newest changes, which should not take more than a few seconds. Grafana can be used now. Dashboards can be created … but let’s run some tests with PromQL queries instead.
Let’s Test
Log in to your Grafana using OAuth and a cluster administrator.
You could also use a non-cluster administrator, if the user is able to GET the services of the Grafana namespace. The reason is the following line in the Grafana CRD: -openshift-sar={"namespace": "grafana", "resource": "services","verb": "get"} which defines that OAuth will work for everybody who can get the service. This might be changed according to personal needs, but for this test, it is good enough. |
Then use the credentials for the admin account, which have been defined while creating the Grafana instance.
You will be logged in now, and since there are no Dashboards, let’s go to Explore to enter some custom PromQL queries. For this instance, we will use our example from above:
sum(rate(http_requests_total[2m])) |
Figure 3. First Query
This is looking good.
Let’s give it another try and sort by namespaces:
sum(rate(http_requests_total[2m])) by (namespace) |
Figure 4. Second Query - showing internal namespace
What is this? I see a namespace that is actually meant for the cluster (openshift-monitoring).
Let’s try another query using a different metric:
sum(node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate) by (namespace) |
Figure 5. Third Query - shows even more namespaces
OK, so we have access to all namespaces on the cluster.
Why Do I see All namespaces?
What does this mean? Well, it means that we have access to all namespaces of the cluster. We see everything. This makes sense since we assigned the cluster role "cluster-monitoring-view" to the serviceaccount of Grafana.
But what if we want to show only objects from a specific namespace, such as giving the developers the ability to create their own dashboards without having view access to the whole cluster.
The first test might be to remove the cluster-monitoring-view privileges from the Grafana serviceaccount.
This will lead to an error on Grafana itself since it cannot access the Thanos Querier, which we configured with: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
How does the Openshift WebUI actually work, when you are a developer and would like to search one of the above queries?
Let’s try that:
Figure 6. Query using the OpenShift UI
It works! It shows the namespace of the developer and only this namespace. When you inspect the actual network traffic, you will see that OpenShift automatically adds the URL parameter namespace=ns1 to the request URL:
https://your-cluster/api/prometheus-tenancy/api/v1/query?namespace=ns1&query=sum%28node_namespace_pod_container%3Acontainer_cpu_usage_seconds_total%3Asum_rate%29+by+%28namespace%29
This is good information. Let's try this using the Grafana Data Source.
It is currently not possible to perform this configuration using the GrafanaDataSource CRD. Instead, it must be done directly at the Grafana Dashboard configuration. There is an open ticket at: https://github.com/integr8ly/grafana-operator/issues/309. |
Login to Grafana as administrator and switch to "Configuration > Data Source > Prometheus >". At the very bottom add namespace=ns1 to the Custom query parameters:
Figure 7. Configure Grafana Data Source
At this point, the Grafana serviceaccount has cluster_monitoring_view privileges. |
As you can see in the following image, this configuration did not help:
Figure 8. Query after Data Source has manually been modified
Thanos Querier Versus Thanos Querier
To summarize, in the OpenShift UI everything works, but when using the Grafana dashboard, we see all namespaces from the cluster. Let’s try to find out how OpenShift does this.
When we check the Thanos services we will see three ports:
ports: |
Currently, we configured port 9091, but there is another one, which is called tenancy.. Maybe this is what we need? Let’s try it:
1: Change the CRD GrafanaDataSource to use port 9092 (instead of 9091). This will restart the pod and remove the custom query parameter we configured earlier.2: Remove the cluster-role:
oc adm policy remove-cluster-role-from-user cluster-monitoring-view -z grafana-serviceaccount |
oc adm policy add-role-to-user view system:serviceaccount:grafana:grafana-serviceaccount -n ns1 |
4: Log into Grafana as administrator and manually change the Data Source and add namespace=ns1 to the setting Custom query parameters.
5: Rerun the Query since you will now see one namespace
Figure 9. Query with Thanos Querier on port 9092
What Happened?
So what actually happened here? We have two important ports for our Thanos Querier: 9091 and 9092.
When we check the Deployment of the Thanos Querier for these ports, we will see:
For the port 9091 it looks like the following:
spec: |
There is an OAuth setting that inidcates : you need the privilege to GET the objects "namespace".
The only cluster role that has this privilege, which is also mentioned by the official OpenShift documentation, is cluster-monitoring-view:
- apiVersion: rbac.authorization.k8s.io/v1 |
As we have seen above, this will show you all namespaces available on the cluster.
When you check port 9092, there is no such OAuth configuration. This service is actually in front of the container kube-rbac-proxy. It does not require OAuth, but instead the namespace URL parameter.
Details can be found at: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/user-workload-monitoring.md
In short, the whole setup looks like this:
Figure 10. Thanos interconnecting containers
While port 9091 goes directly to Thanos it will require that you have the cluster-monitoring-view role. Port 9092 does not require this, but instead, you must send the URL parameter namespace=.
Summary
While both options are valid, remember this about the Grafana Operator:.
- Currently, the URL parameter can be set in Grafana directly only. The operator will ignore it. The ticket in the project shall address this, but is not yet implemented: https://github.com/integr8ly/grafana-operator/issues/309
- The URL parameter setting will be gone when the Grafana pods is restarted, which might lead to a problem.
- While the Grafana serviceaccount does not require cluster permissions, it will require permission to view the appropriate namespace
- All the above also means that you actually would need to create a new DataSource for every project you want to monitor. I was not able to find a way to send multiple namespaces in the URL parameter.
Is it useful to use the Grafana Operator at all? Probably yes, since operators are the future, and operators are actively developed. Nevertheless, it is always possible to deploy Grafana manually.
About the author
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit