Monitoring applications in a Kubernetes cluster using Prometheus, Alertmanager and Slack

2023 年 11 月 29 日Twinkll Sisodia3 分 (読了時間の目安)

What is Kubernetes monitoring?

Monitoring is a way to understand the inner workings of a system. It provides an inside view of the infrastructure or application to help minimize downtime. It also makes it easier to manage a containerized infrastructure by regularly monitoring CPU, memory and storage usage. Kubernetes Operators help achieve this task so you can understand how many pods are in failing or success status, if resources are exceeding limits, etc.

Why is monitoring important?

The exponential growth of infrastructure and containers in the enterprise is a great advantage to software developers, DevOps engineers and IT teams. With Kubernetes comes new challenges in managing big infrastructures. With microservices applications becoming increasingly common in enterprise-level businesses, monitoring the components involved is becoming more complex. This is why monitoring is still critical today.

Monitoring tools

Some common monitoring tools are listed below:

Prometheus
Grafana
Dynatrace
Splunk
ELK Stack
AWS CloudWatch
Google Cloud Monitoring
Azure Monitor

This demo uses Prometheus and its components to monitor an application, which then sends alerts via Slack.

Demonstration steps

Step 1: Create namespaces:

oc create ns monitor
oc create ns blue

Step 2: Deploy BLUE app:

kubectl apply -f -&lt;&lt;EOF
apiVersion: v1
kind: Service
metadata:
 labels:
   app: blue
   version: v1
 name: blue
 namespace: blue
spec:
 ports:
   - port: 9000
     name: http
     nodePort: 32002
 selector:
   app: blue
 type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
   app: blue
   version: v1
 name: blue
 namespace: blue
spec:
 selector:

matchLabels:
     app: blue
     version: v1
 replicas: 1
 template:
   metadata:
     labels:
       app: blue
       version: v1
   spec:
     serviceAccountName: blue
     containers:
       - image: docker.io/cmwylie19/go-metrics-ex
         name: blue
         resources:
           requests:
             memory: "64Mi"
             cpu: "250m"
           limits:
             memory: "128Mi"
             cpu: "500m"
         readinessProbe:
           failureThreshold: 3
           initialDelaySeconds: 10
           successThreshold: 1
           periodSeconds: 10
           httpGet:
             path: /
             port: 9000
         livenessProbe:
           initialDelaySeconds: 10
           periodSeconds: 10
           httpGet:
             path: /
             port: 9000
         ports:
           - containerPort: 9000
             name: http
         imagePullPolicy: Always
     restartPolicy: Always
---
apiVersion: v1
kind: ServiceAccount
metadata:
 name: blue
 namespace: blue
EOF

As an alternative step, create a file app.yaml and place it in the monitor namespace:

oc apply -f app.yaml

Step 3: Install the Observability Operator:

kubectl apply -f -&lt;&lt;EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
 annotations:
 name: observability-operator
 namespace: openshift-marketplace
spec:
 displayName: Observability Operator - Test
 icon:
   base64data: ""
   mediatype: ""
 image: quay.io/rhobs/observability-operator-catalog:latest
 publisher: Twinkll Sisodia
 sourceType: grpc
 updateStrategy:
   registryPoll:
     interval: 1m0s
EOF

Another option is to navigate to OperatorHub and install Observability Operator from the dashboard.

Step 4: Create an Observability Operator instance:

kubectl apply -f -&lt;&lt;EOF
kind: MonitoringStack
apiVersion: monitoring.rhobs/v1alpha1
metadata:
 labels:
   mso: blue
 name: blue-monitoring-stack
 namespace: monitor
spec:
 logLevel: debug
 resourceSelector:
   matchLabels:
     system: foo
 retention: 1d
EOF

Step 5: Create the ServiceMonitor:

kubectl apply -f -&lt;&lt;EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
 name: blue
 namespace: monitor
 labels:
   app: blue
spec:
 selector:
   matchLabels:
     app: blue
 namespaceSelector:
   any: true
   matchNames:
   - blue    
 endpoints:
 - port: http
EOF

Step 6: Create the PrometheusRules:

kubectl apply -f -&lt;&lt;EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
 creationTimestamp: null
 labels:
   prometheus: blue
   role: alert-rules
 name: blue-rules
 namespace: monitor
spec:
 groups:
 - name: recording_rules
   interval: 2s
   rules: 
   - record: blue_requests_per_minute
     expr: increase(http_requests_total{container="blue"}[1m])
 - name: LoadRules
   rules: 
   - alert: HighLoadBlue
     expr: blue_requests_per_minute >= 10 
     labels:
       severity: page # or critical 
     annotations:
       summary: "high load average"
       description: "high load average"
   - alert: MediumLoadBlue
     expr: blue_requests_per_minute >= 5 
     labels:
       severity: warn 
     annotations:
       summary: "medium load average"
       description: "medium load average" 
   - alert: LowLoadBlue
     expr: blue_requests_per_minute >= 1 
     labels:
       severity: acknowledged
     annotations:
       summary: "low load average"
       description: "low load average"
EOF

Step 7: Create role-based access control (RBAC) rules:

oc create clusterrole blue-view --verb=get,list,watch --resource=pods,pods/status,services,endpoints

oc create clusterrolebinding blue-crb --clusterrole=blue-view --serviceaccount=monitor:blue-prometheus

Step 8: Create the Alertmanager Secret:

route:
 group_by: [alertname,cluster,service,job]
 receiver: "slack"
 repeat_interval: 1m
 group_interval: 1m
 group_wait: 10s
 routes:
 - match: 
     severity: 'warn'
   receiver: "slack"
 - match: 
     severity: 'acknowledged'
   receiver: "slack"
 - match: 
     severity: 'page'
   receiver: "slack"
receivers:
- name: "slack"
 slack_configs:
 - api_url: 'your_api-key'
   channel: '#monitoring-alerts'    
   send_resolved: true
templates: []

oc create secret generic alertmanager-blue --from-file=alertmanager.yaml=alertmanager.yaml

Step 9: Trigger alerts:

You must issue requests to the blue application to trigger the alerts in Prometheus.

One or more times to trigger LowLoadBlue
Five or more times to trigger MediumLoadBlue
Ten or more times to trigger HighLoadBlue

Create a service to call the blue service:

kubectl run curler --image=nginx:alpine --port=80 --expose

Curl the blue app 15 times:

for z in $(seq 15); do kubectl exec -it pod/curler -- curl blue:9000/; done

Output:

OKOKOKOKOKOKOKOKOKOKOKOKOKOKOK

Next, check the alerts from the Prometheus UI. Sometimes it takes a few minutes. Make sure the targets have populated in the console. If they are not up, wait before checking for alerts:

kubectl port-forward prometheus-starburst-0 9090

Wrap up

Monitoring is an essential part of any container infrastructure. It enables IT staff to better maintain systems, minimize downtime and manage resource utilization. Many tools exist, but the above steps apply specifically to Prometheus.

Consider using these steps to establish your own Kubernetes monitoring solution, allowing you to understand your containerization solution better.

執筆者紹介

Twinkll Sisodia

Senior Software Engineer

Twinkll Sisodia is a Senior Software Engineer at Red Hat, where she leads initiatives focused on OpenShift AI, observability, and partner integrations. With a strong background in AI infrastructure and platform automation, she works at the intersection of engineering and collaboration, building scalable, production-ready solutions for enterprise AI workloads. Twinkll has been instrumental in driving observability for large language models using tools like OpenTelemetry and Dynatrace, and actively contributes to Red Hat's AI kickstarts and open-source projects. She collaborates closely with partners to integrate cutting-edge technologies into OpenShift AI, enhancing visibility, performance, and usability. Passionate about building impactful solutions and driving innovation, she continues to shape the future of AI platform engineering.

Read full bio