In Part 2 of this series, we created one WordPress instance connected to a MySQL server on top of Red Hat OpenShift Container Platform. Now it’s time to scale from one deployment to more by better using our available resources. Once we’ve done that, we’ll show a failure scenario again, illustrating what effect different storage backends can have.
OpenShift on AWS test environment
All posts in this series use a Red Hat OpenShift Container Platform on AWS setup that includes 8 EC2 instances deployed as 1 master node, 1 infra node, and 6 worker nodes that also run Red Hat OpenShift Container Storage Gluster and Heketi pods.
The 6 worker nodes are basically the storage provider and persistent storage consumers (MySQL). As shown in the following, the OpenShift Container Storage worker nodes are of instance type m5.2xlarge with 8 vCPUs, 32 GB Mem, and 3x100GB gp2 volumes attached to each node for OCP and one 1TB gp2 volume for OCS storage cluster.
The AWS region us-west-2 has availability zones (AZs) us-west-2a, us-west-2b, and us-west-2c, and the 6 worker nodes are spread across the 3 AZs, 2 nodes in each AZ. This means the OCS storage cluster is stretched across these 3 AZs. Below is a view from the AWS console showing the EC2 instances and how they are placed in the us-east-2 AZs.
WordPress/MySQL setup
In Part 2 of this series, we showed how to use a stateful set to create one Wordpress/MySQL project. One deployment on a 6-node cluster is not a typical use case, however. To take our example to the next level, we will now create 60 identical projects, each running one WordPress and one MySQL pod.
The RAM available in our cluster is why we will use 60 deployments; Every compute node is equipped with 32 GB of RAM, so if we deploy 60 instances, each of which uses 2GB for the MySQL pod, will use 120 GB of the available overall 192 GB. That will leave enough memory available for the OpenShift cluster and the WordPress pods.
oc get projects | grep wp | wc -l 60
A closer look to project wp-1 shows us that it’s identical to what we used earlier:
oc project wp-1 oc get all NAME READY STATUS RESTARTS AGE pod/mysql-ocs-0 1/1 Running 0 10m pod/wordpress-1-6jmkt 1/1 Running 0 10m pod/wordpress-1-build 0/1 Completed 0 10m NAME DESIRED CURRENT READY AGE replicationcontroller/wordpress-1 1 1 1 10m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/glusterfs-dynamic-81b7b6cf-3f46-11e9-a504-02e7350e98d2 ClusterIP 172.30.176.74 <none> 1/TCP 10m service/mysql-ocs ClusterIP 172.30.27.4 <none> 3306/TCP 10m service/wordpress ClusterIP 172.30.23.152 <none> 8080/TCP,8443/TCP 10m NAME DESIRED CURRENT AGE statefulset.apps/mysql-ocs 1 1 10m NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/wordpress 1 1 1 config,image(wordpress:latest) NAME TYPE FROM LATEST buildconfig.build.openshift.io/wordpress Source Git 1 NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/wordpress-1 Source Git@4094d36 Complete 10 minutes ago 20s NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/wordpress docker-registry.default.svc:5000/wp-1/wordpress latest 10 minutes ago NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/wordpress wordpress-wp-1.apps.ocpocs311.sagyocpocsonaws.com wordpress 8080-tcp None
Because configuring 60 WordPress instances can be tedious, we automated the process, using curl in a bash script:
#!/bin/bash START=1 END=60 # number of WordPress / MySQL projects # set up the WordPress instances to attach to the corresponding MySQL DBs function configure { echo Configuring host: $HOST curl -c /tmp/cookie $1/wp-admin/setup-config.php?step=1 2>&1 > /dev/null curl -b /tmp/cookie --data "dbname=wordpress&uname=admin&pwd=secret&dbhost=mysql-ocs&prefix=wp_&submit=Submit" $1/wp-admin/setup-config.php?step=2 2>&1 > /dev/null curl -b /tmp/cookie --data "weblog_title=Title&user_name=admin&admin_password=secret&pass1-text=secret&admin_password2=secret&pw_weak=on&admin_email=admin%40somewhere.com&Submit=Install+WordPress&language=en_US" $1/wp-admin/install.php?step=2 2>&1 > /dev/null } # get all the hosts we need to configure for (( i=$START; i<=$END; i++ )) do echo Sleeping for 2 minutes to allow pods to come up... sleep 120 HOST=$(oc get route wordpress -n wp-$i | grep -v NAME | cut -d " " -f 4) configure $HOST done
We now have our 60 projects running on Gluster-backed storage, one glusterfs volume per deployment. Each of these 60 projects comprises a namespace (synonym for project in the OpenShift terminology). There are 2 pods in each namespace (one WordPress pod, one MySQL pod) and one Persistent Volume Claim (PVC). This PVC is the storage on which MySQL will keep the database contents.
oc get project | grep wp | wc -l
60
Therefore, we have 60 projects up and running, each of which has its own PVC:
oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mysql-ocs-data-mysql-ocs-0 Bound pvc-81b7b6cf-3f46-11e9-a504-02e7350e98d2 8Gi RWO glusterfs-storage 15m
Failure scenario 1: WordPress/MySQL backed by Open Container Storage
Now we want to see how long it takes a higher number of our WordPress/MySQL deployments to be restarted after a simulated node/instance failure. To do that, we will, again, cordon one of our nodes and delete the pods running on that node:
oc get nodes | grep compute NAME STATUS ROLES AGE VERSION ip-172-16-26-120.us-west-2.compute.internal Ready compute 28d v1.11.0+d4cacc0 ip-172-16-27-161.us-west-2.compute.internal Ready compute 28d v1.11.0+d4cacc0 ip-172-16-39-190.us-west-2.compute.internal Ready compute 28d v1.11.0+d4cacc0 ip-172-16-44-7.us-west-2.compute.internal Ready compute 28d v1.11.0+d4cacc0 ip-172-16-53-212.us-west-2.compute.internal Ready compute 28d v1.11.0+d4cacc0 ip-172-16-56-45.us-west-2.compute.internal Ready compute 28d v1.11.0+d4cacc0
That done, let’s find all MySQL pods on one of our preceding compute nodes (should be 10 pods on each compute node):
oc adm manage-node ip-172-16-26-120.us-west-2.compute.internal --list-pods | grep -i mysql Listing matched pods on node: ip-172-16-26-120.us-west-2.compute.internal wp-1 mysql-ocs-0 1/1 Running 0 13m wp-13 mysql-ocs-0 1/1 Running 0 12m wp-20 mysql-ocs-0 1/1 Running 0 12m wp-22 mysql-ocs-0 1/1 Running 0 12m wp-27 mysql-ocs-0 1/1 Running 0 12m ...omitted
So these are the pods running on the node we will cordon. Similar to the method we used in Part 2, we’ve set up a monitoring routine continuously retrieving the HTTP status for the start page of the WordPress site. Now we cordon the node ip-172-16-26-120.us-west-2.compute.internal and then delete all MySQL pods on it, using the following script:
#!/bin/bash TARGET=$1 # get the namespaces in which the mysql pods live NAMESPACES=$(oc adm manage-node $TARGET --list-pods 2<&1 | grep -i mysql | awk '{print $1}') # cordon the node echo Cordoning $TARGET oc adm cordon $TARGET # force delete the pods to simulate a node failure echo Deleting mysql pods on $TARGET for NAME in $NAMESPACES do oc delete pod mysql-ocs-0 -n $NAME --force --grace-period=0 done
In the tests we’ve performed, we used 60 WordPress/MySQL instances with a distribution of 10 per compute node. Our monitoring script gave us the time between the first noticed failure on any of those 10 WordPress instances and the last one. We ran 5 identical tests, taking the average time it took all 10 instances to be fully functional again. That average time was 20 seconds.
In other words, from the first pod failure to the last pod recovery was as short as 20 seconds. This is not the time one MySQL pods takes to restart but rather is the total recovery time for all the failed pods. After this time, all the MySQL pods using glusterfs storage are back up and running the HTTP status for the start page of the WordPress successfully.
Note: For a higher number of MySQL pods, it may be necessary to increase fs.aio-max-nr on the compute nodes. Details about the reason and the solution can be found here.
Failure scenario 2: WordPress/MySQL backed by Amazon’s EBS volumes
The next step is to redo the tests on a different storage back end. We’ve chosen Amazon EBS storage. This type of storage comes with a few limitations compared to the gluster-based backend we used earlier:
-
First and most important to our tests is that EBS volumes cannot migrate between AZs. For our testing, that means pods can only migrate between two nodes, as we only have two OCP nodes per AZ.
-
Furthermore, we can only attach up to 40 EBS volumes to one node because of Linux-specific volume limits. Beyond this limit, it may not work as expected (AWS Instance Volume Limits).
To simulate the most comparable setup to the OCS-backed test we showed in Scenario 1, we decided to stick to 10 WordPress/MySQL instances per node. Additionally, our testing followed the same steps in Scenario 1:
-
Set up monitoring for the WordPress instances.
-
Cordon the node that runs the pods.
-
Delete the MySQL pods on the node.
-
Record the time it takes for all WordPress instances to be functional again.
-
Un-cordon the node.
While the time to re-instantiate all MySQL pods took about 20 seconds for 10 pods on OCS. For the 5 identical tests ran, we see an average time of 378 seconds for 10 similar pods on EBS storage.
Conclusion
The tests in this post show a few things:
-
OpenShift Container Storage can provide faster failover and recovery times compared to native EBS storage in case of a node failure, which can result in higher availability for your application.
Additionally, if we configured only one node per AZ--which is quite common--there would be an outage with the setup configured for EBS-only, which is not the case with OpenShift Container Storage because the latter can provide high availability across AZs. It’s important to note that OpenShift Container Storage also is backed by EBS volumes, but the abstraction layer that GlusterFS introduces can reduce the time for re-attaching the volume inside the MySQL pod following a failure.
Storage backend |
Time to failover 10 MySQL pods |
---|---|
OpenShift Container Storage |
20 seconds |
EBS volumes |
378 seconds |
-
OpenShift Container Storage spans the storage availability over different AZs and helps increase the reliability of the OpenShift cluster.
-
The usage of OpenShift Container Storage makes the instance volume limit for EBS less problematic, as a lower number of larger volumes can be used to host the required persistent volumes. Again, this is a benefit of the GlusterFS abstraction layer introduced through deploying OpenShift Container Storage.
The next blog post in this series will be about using the SysBench 0.5 database testing tool to measure MySQL read/write performance on OCS. Since real tuning is scale driven, this blog will feature many (60) small MySQL databases (10GB) and the results will be published for RWX (GlusterFS volume) and RWO (GlusterBlock volumes). The failure scenario is this blog post will also be repeated but this time with SysBench read/write load.
執筆者紹介
チャンネル別に見る
自動化
テクノロジー、チームおよび環境に関する IT 自動化の最新情報
AI (人工知能)
お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート
オープン・ハイブリッドクラウド
ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。
セキュリティ
環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報
エッジコンピューティング
エッジでの運用を単純化するプラットフォームのアップデート
インフラストラクチャ
世界有数のエンタープライズ向け Linux プラットフォームの最新情報
アプリケーション
アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細
オリジナル番組
エンタープライズ向けテクノロジーのメーカーやリーダーによるストーリー
製品
ツール
試用、購入、販売
コミュニケーション
Red Hat について
エンタープライズ・オープンソース・ソリューションのプロバイダーとして世界をリードする Red Hat は、Linux、クラウド、コンテナ、Kubernetes などのテクノロジーを提供しています。Red Hat は強化されたソリューションを提供し、コアデータセンターからネットワークエッジまで、企業が複数のプラットフォームおよび環境間で容易に運用できるようにしています。
言語を選択してください
Red Hat legal and privacy links
- Red Hat について
- 採用情報
- イベント
- 各国のオフィス
- Red Hat へのお問い合わせ
- Red Hat ブログ
- ダイバーシティ、エクイティ、およびインクルージョン
- Cool Stuff Store
- Red Hat Summit