This post will show how to gather Apache Spark Metrics with Prometheus and display the metrics with Grafana in OpenShift 3.9. We start with a description of the environment, then show how to set up Spark, Prometheus, and Grafana.
Environment Overview
This is the environment we’re working with. The steps we walk through should apply to slightly newer/older versions of these packages, but it’s possible that you’ll have to tweak things for different releases.
- Red Hat Enterprise Linux 7.5
- OpenShift Container Platform 3.9 Cluster
- Container Runtime: docker-1.13.1
- OpenShift Prometheus 3.9
- Apache Spark 2.3.0/Oshinko 0.5.4
- Grafana 4.7.0-pre1
Part 1: Host Cluster Preparation
Install Red Hat Enterprise Linux 7.5 on all nodes in the cluster, then install OpenShift 3.9 with Openshift Prometheus enabled. Use the host preparation (Section 2.3) and installation guide to install Openshift 3.9 and Openshift Prometheus.
Create an inventory file with prometheus enabled. We've included an example inventory file with Prometheus enabled, single master, single etcd, and multiple nodes:
[OSEv3:children]
masters
nodes
etcd
nfs
[OSEv3:vars]
openshift_release=v3.9
# to obtain the ip address of your router for the default
# subdomain setting
# oc get pod router-1-xbq2t --template={{.status.hostIP}}”
openshift_master_default_subdomain=X.X.X.X.nip.io
openshift_enable_service_catalog=false
openshift_enable_unsupported_configurations=True
#enable NTP
openshift_clock_enabled=true
# Allow any user by default
deployment_type=openshift-enterprise
openshift_master_identity_providers=[{'name': 'htpasswd_auth', \
'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider', \
'filename': '/etc/origin/master/htpasswd'}]
ansible_ssh_user=root
# Add multi-tenant sdn
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
# Enable cluster metrics. If you want to turn metrics off,
# just set these to false
openshift_metrics_install_metrics=false
#Enable prometheus
openshift_hosted_prometheus_deploy=true
openshift_prometheus_namespace=openshift-metrics
openshift_prometheus_node_selector={"region":"infra"}
# added for Non-persistent Storage
openshift_metrics_cassandra_storage_type=emptydir
# set up persistent storage for the registry
openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_nfs_directory=/exports
openshift_hosted_registry_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_volume_size=100Gi
openshift_metrics_storage_kind=nfs
openshift_metrics_storage_access_modes=['ReadWriteOnce']
openshift_metrics_storage_nfs_directory=/exports
openshift_metrics_storage_nfs_options='*(rw,root_squash)'
openshift_metrics_storage_volume_name=metrics
openshift_metrics_storage_volume_size=1Gi
# set up prometheus storage
openshift_prometheus_storage_kind=nfs
openshift_prometheus_alertmanager_storage_kind=nfs
openshift_prometheus_alertbuffer_storage_kind=nfs
openshift_prometheus_storage_access_modes=['ReadWriteOnce']
openshift_prometheus_storage_nfs_directory=/exports
openshift_prometheus_storage_nfs_options='*(rw,root_squash)'
openshift_prometheus_storage_volume_name=prometheus
openshift_prometheus_storage_volume_size=10Gi
openshift_prometheus_storage_labels={'storage': 'prometheus'}
openshift_prometheus_storage_type='pvc'
openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_options="*(rw,root_squash,sync,no_wdelay)"
openshift_hosted_etcd_storage_nfs_directory=/exports
openshift_hosted_etcd_storage_volume_name=etcd-vol2
openshift_hosted_etcd_storage_access_modes=["ReadWriteOnce"]
openshift_hosted_etcd_storage_volume_size=1G
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}
openshift_logging_storage_kind=nfs
openshift_logging_storage_access_modes=['ReadWriteOnce']
openshift_logging_storage_nfs_directory=/exports
openshift_logging_storage_nfs_options='*(rw,root_squash)'
openshift_logging_storage_volume_name=logging
openshift_logging_storage_volume_size=10Gi
# disable cockpit installation
osm_use_cockpit=false
[masters]
node01.example.com
[etcd]
node01.example.com
[nfs]
node01.example.com
[nodes]
node01.example.com openshift_schedulable=True
node02.example.com openshift_node_labels="{'region': 'primary', \
'zone': 'default'}" openshift_schedulable=False
node03.example.com openshift_node_labels="{'region': 'primary', \
'zone': 'default'}"
node04.example.com openshift_node_labels="{'region': 'primary', \
'zone': 'default'}"
node05.example.com openshift_node_labels="{'region': 'primary', \
'zone': 'default'}"
node06.example.com openshift_node_labels="{'region': 'infra', \
'zone': 'default'}"
node07.example.com openshift_node_labels="{'region': 'primary', \
'zone': 'default'}"
node08.example.com openshift_node_labels="{'region': 'primary', \
'zone': 'default'}"
Part 2: Spark/Oshinko installation
Install Oshinko to obtain containerized Spark with configuration settings which allow you to export Spark metrics. Use these steps on the OpenShift master node.
# oc login -u system:admin
Switch to project openshift-metrics.
# oc project openshift-metrics
Obtain and install Oshinko Source-to-Image (S2I) templates.
Important Note: the Oshinko-s2i and Oshinko-cli tar files you download must have the same release version, e.g. oshinko_s2i_v0.5.4.tar.gz and oshinko_v0.5.4_linux_adm64.tar.gz as shown in the example here.
# wget https://github.com/radanalyticsio/oshinko-s2i/releases/download/v0.5.4/oshinko_s2i_v0.5.4.tar.gz
# tar xvfz oshinko_s2i_v0.5.4.tar.gz
Create Oshinko S2Itemplates.
# oc create -f release_templates/pythonbuild.json
# oc create -f release_templates/pythonbuilddc.json
# oc create -f release_templates/sparkjob.json
Obtain Oshinko binary to create a cluster which exports spark metrics in JMX format.
# wget https://github.com/radanalyticsio/oshinko-cli/releases/download/v0.5.4/oshinko_v0.5.4_linux_amd64.tar.gz
# tar xvfz oshinko_v0.5.4_linux_adm64.tar.gz
Create "metricsconfig" and "clusterconfig" configmaps to set spark metrics properties.
Start by creating a directory to store your Spark metrics settings. Then you want to create a text file named metrics.properties with these six lines:
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
application.source.jvm.class=org.apache.spark.metrics.source.JvmSource
Now, create the configmaps as shown here.
# oc create configmap metricsconfig --from-file=metrics
# oc create configmap clusterconfig --from-literal=sparkmasterconfig=metricsconfig \
--from-literal=sparkworkerconfig=metricsconfig
Run "oc get configmaps" command to see your configmaps. The output should look like this:
# oc get configmaps
NAME DATA AGE
clusterconfig 2 5s
metricsconfig 2 5s
Use the oshinko binary from the tar file to create a Spark cluster with Prometheus metrics enabled. In this example we are creating a Spark cluster with four workers.
# ./oshinko create <yourclusternamehere> --workers=4 \
--metrics=prometheus --storedconfig=clusterconfig
Add spark master and spark worker to Prometheus config map scrape configuration. See figure 1 with scrape configuration changes. You will need the IP addresses of the master and worker pods to include in this static scrape setting. You can find the IP address for each of the master and worker pods by navigating to “applications->pods” in the openshift web user interface (UI) and clicking on the pod name for each of your workers and master pods. In the future we would like to allow the user to label their Spark master and worker so that they will be discovered automatically.
Part 3: Grafana Installation
Download script to install Grafana.
# wget https://github.com/mrsiano/openshift-grafana/archive/master.zip
# unzip master.zip
Change all references to project "grafana" to project "openshift-metrics" in setup_grafana.sh and grafana-ocp.yaml.
Deploy grafana.
# ./setup-grafana.sh prometheus-ocp openshift-metrics true
Navigate to the Grafana web UI in the OpenShift console, login in to OpenShift again when prompted allow access to your account by service account grafana-ocp.
Set up the Grafana datasource configuration as shown in the diagram. You will need to specify name as "prometheus", type as "prometheus", URL as the route for prometheus, e.g. https://prometheus-openshift-metrics.10.8.47.25.nip.io. You look this up via the route tab in the OpenShift web UI. Check the box for "Skip TLS verfication (insecure)". In the OpenShift web UI navigate to "Resources->Secrets" and copy the "token" from the bottom of “grafana-ocp-token-xxxxx” page and paste it into the “Prometheus settings” token on this Grafana data source page.
Part 4: Run a containerized Spark job and create Grafana dashboards to display metrics you collect
Run your Spark job using Oshinko S2I templates. In this example we are using template "oshinko-python-spark-dc". Our Python Spark source code is on GitHub. The S2I template will convert your Python code to a Linux container image.
#! /bin/bash
#
#
APPLICATION_NAME=myapplicationname
APP_FILE=myappplicationname.py
SPARK_OPTIONS=" --driver-java-options \
'-javaagent:/opt/app-root/src/agent-bond.jar=/opt/app-root/src/agent.properties' \
--executor-memory 10G --conf spark.default.parallelism=40"
GIT_URI=https://github.com/yourrepo
GIT_REF=master
OSHINKO_CLUSTER_NAME=yoursparkclustername
CONFIG_MAP=metricsconfig
APP_EXIT=false
echo "$APPLICATION_NAME"
echo "$APP_FILE"
echo "$GIT_URI"
echo "$GIT_REF"
echo "$APP_EXIT"
echo "$OSHINKO_CLUSTER_NAME"
echo "$APP_ARGS"
echo "$SPARK_OPTIONS"
oc new-app --template=oshinko-python-spark-build-dc \
--param=APPLICATION_NAME="$APPLICATION_NAME" \
--param=APP_FILE="$APP_FILE" \
--param=GIT_URI="$GIT_URI" \
--param=GIT_REF="$GIT_REF" \
--param=OSHINKO_SPARK_DRIVER_CONFIG="$CONFIG_MAP" \
--param=OSHINKO_CLUSTER_NAME="$OSHINKO_CLUSTER_NAME" \
--param=SPARK_OPTIONS="$SPARK_OPTIONS"
Now you can build our own Grafana dashboards to view metrics for your running Spark job in Grafana. Here is an example dashboard displaying Spark Metrics in Grafana.
Give it a try yourself, we think you'll find this very useful.
저자 소개
Diane Feddema is a Principal Software Engineer at Red Hat leading performance analysis and visualization for the Red Hat OpenShift Data Science (RHODS) managed service. She is also a working group chair for the MLCommons Best Practices working group and the CNCF SIG Runtimes working group.
She also creates experiments comparing different types of infrastructure and software frameworks to validate reference architectures for machine learning workloads using MLPerf™. Previously, Feddema was a performance engineer at the National Center for Atmospheric Research, NCAR, working on optimizations and tuning of parallel global climate models. She also worked at SGI and Cray on performance and compilers.
She has a bachelor's in Computer Science from the University of Iowa and master's in Computer Science from the University of Colorado.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.