What is Kubernetes monitoring?
Monitoring is a way to understand the inner workings of a system. It provides an inside view of the infrastructure or application to help minimize downtime. It also makes it easier to manage a containerized infrastructure by regularly monitoring CPU, memory and storage usage. Kubernetes Operators help achieve this task so you can understand how many pods are in failing or success status, if resources are exceeding limits, etc.
Why is monitoring important?
The exponential growth of infrastructure and containers in the enterprise is a great advantage to software developers, DevOps engineers and IT teams. With Kubernetes comes new challenges in managing big infrastructures. With microservices applications becoming increasingly common in enterprise-level businesses, monitoring the components involved is becoming more complex. This is why monitoring is still critical today.
Monitoring tools
Some common monitoring tools are listed below:
- Prometheus
- Grafana
- Dynatrace
- Splunk
- ELK Stack
- AWS CloudWatch
- Google Cloud Monitoring
- Azure Monitor
This demo uses Prometheus and its components to monitor an application, which then sends alerts via Slack.
Demonstration steps
Step 1: Create namespaces:
oc create ns monitor
oc create ns blueStep 2: Deploy BLUE app:
kubectl apply -f -<<EOF
apiVersion: v1
kind: Service
metadata:
labels:
app: blue
version: v1
name: blue
namespace: blue
spec:
ports:
- port: 9000
name: http
nodePort: 32002
selector:
app: blue
type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: blue
version: v1
name: blue
namespace: blue
spec:
selector:matchLabels:
app: blue
version: v1
replicas: 1
template:
metadata:
labels:
app: blue
version: v1
spec:
serviceAccountName: blue
containers:
- image: docker.io/cmwylie19/go-metrics-ex
name: blue
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
successThreshold: 1
periodSeconds: 10
httpGet:
path: /
port: 9000
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 10
httpGet:
path: /
port: 9000
ports:
- containerPort: 9000
name: http
imagePullPolicy: Always
restartPolicy: Always
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: blue
namespace: blue
EOFAs an alternative step, create a file app.yaml and place it in the monitor namespace:
oc apply -f app.yamlStep 3: Install the Observability Operator:
kubectl apply -f -<<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
annotations:
name: observability-operator
namespace: openshift-marketplace
spec:
displayName: Observability Operator - Test
icon:
base64data: ""
mediatype: ""
image: quay.io/rhobs/observability-operator-catalog:latest
publisher: Twinkll Sisodia
sourceType: grpc
updateStrategy:
registryPoll:
interval: 1m0s
EOFAnother option is to navigate to OperatorHub and install Observability Operator from the dashboard.
Step 4: Create an Observability Operator instance:
kubectl apply -f -<<EOF
kind: MonitoringStack
apiVersion: monitoring.rhobs/v1alpha1
metadata:
labels:
mso: blue
name: blue-monitoring-stack
namespace: monitor
spec:
logLevel: debug
resourceSelector:
matchLabels:
system: foo
retention: 1d
EOFStep 5: Create the ServiceMonitor:
kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: blue
namespace: monitor
labels:
app: blue
spec:
selector:
matchLabels:
app: blue
namespaceSelector:
any: true
matchNames:
- blue
endpoints:
- port: http
EOFStep 6: Create the PrometheusRules:
kubectl apply -f -<<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
prometheus: blue
role: alert-rules
name: blue-rules
namespace: monitor
spec:
groups:
- name: recording_rules
interval: 2s
rules:
- record: blue_requests_per_minute
expr: increase(http_requests_total{container="blue"}[1m])
- name: LoadRules
rules:
- alert: HighLoadBlue
expr: blue_requests_per_minute >= 10
labels:
severity: page # or critical
annotations:
summary: "high load average"
description: "high load average"
- alert: MediumLoadBlue
expr: blue_requests_per_minute >= 5
labels:
severity: warn
annotations:
summary: "medium load average"
description: "medium load average"
- alert: LowLoadBlue
expr: blue_requests_per_minute >= 1
labels:
severity: acknowledged
annotations:
summary: "low load average"
description: "low load average"
EOFStep 7: Create role-based access control (RBAC) rules:
oc create clusterrole blue-view --verb=get,list,watch --resource=pods,pods/status,services,endpoints
oc create clusterrolebinding blue-crb --clusterrole=blue-view --serviceaccount=monitor:blue-prometheusStep 8: Create the Alertmanager Secret:
route:
group_by: [alertname,cluster,service,job]
receiver: "slack"
repeat_interval: 1m
group_interval: 1m
group_wait: 10s
routes:
- match:
severity: 'warn'
receiver: "slack"
- match:
severity: 'acknowledged'
receiver: "slack"
- match:
severity: 'page'
receiver: "slack"
receivers:
- name: "slack"
slack_configs:
- api_url: 'your_api-key'
channel: '#monitoring-alerts'
send_resolved: true
templates: []oc create secret generic alertmanager-blue --from-file=alertmanager.yaml=alertmanager.yamlStep 9: Trigger alerts:
You must issue requests to the blue application to trigger the alerts in Prometheus.
- One or more times to trigger LowLoadBlue
- Five or more times to trigger MediumLoadBlue
- Ten or more times to trigger HighLoadBlue
Create a service to call the blue service:
kubectl run curler --image=nginx:alpine --port=80 --exposeCurl the blue app 15 times:
for z in $(seq 15); do kubectl exec -it pod/curler -- curl blue:9000/; doneOutput:
OKOKOKOKOKOKOKOKOKOKOKOKOKOKOKNext, check the alerts from the Prometheus UI. Sometimes it takes a few minutes. Make sure the targets have populated in the console. If they are not up, wait before checking for alerts:
kubectl port-forward prometheus-starburst-0 9090Wrap up
Monitoring is an essential part of any container infrastructure. It enables IT staff to better maintain systems, minimize downtime and manage resource utilization. Many tools exist, but the above steps apply specifically to Prometheus.
Consider using these steps to establish your own Kubernetes monitoring solution, allowing you to understand your containerization solution better.
執筆者紹介
Twinkll Sisodia is a Senior Software Engineer at Red Hat, where she leads initiatives focused on OpenShift AI, observability, and partner integrations. With a strong background in AI infrastructure and platform automation, she works at the intersection of engineering and collaboration, building scalable, production-ready solutions for enterprise AI workloads. Twinkll has been instrumental in driving observability for large language models using tools like OpenTelemetry and Dynatrace, and actively contributes to Red Hat's AI kickstarts and open-source projects. She collaborates closely with partners to integrate cutting-edge technologies into OpenShift AI, enhancing visibility, performance, and usability. Passionate about building impactful solutions and driving innovation, she continues to shape the future of AI platform engineering.
チャンネル別に見る
自動化
テクノロジー、チームおよび環境に関する IT 自動化の最新情報
AI (人工知能)
お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート
オープン・ハイブリッドクラウド
ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。
セキュリティ
環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報
エッジコンピューティング
エッジでの運用を単純化するプラットフォームのアップデート
インフラストラクチャ
世界有数のエンタープライズ向け Linux プラットフォームの最新情報
アプリケーション
アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細
仮想化
オンプレミスまたは複数クラウドでのワークロードに対応するエンタープライズ仮想化の将来についてご覧ください