Blog de Red Hat
IBM delivers solutions designed to mitigate risk and facilitate cloud adoption. In particular, organizations deploying production IBM Db2 workloads need scalable and performant persistent storage that provides their applications with universal application and data mobility. Cloud and container-based solutions must support all of their data, without forcing arbitrary compromises.
The IBM Db2 team has spent the last several years transforming its delivery and infrastructure toward a Kubernetes-native Db2, tailored for hybrid and multi-clouds and managed by Red Hat OpenShift. One of the most important aspects of this transformation is integration with Red Hat OpenShift Container Storage.
In this blog post we will highlight the performance and validation testing completed between IBM and Red Hat Data Services engineers using Db2 Data Warehouse on the cloud (Db2WHoC) running on OpenShift Container Platform 4.3 with OpenShift Container Storage 4.3 for persistent storage.
IBM’s Db2 has been a leading enterprise database solution for years. It delivers online transaction processing (OLTP) and data warehousing services for many enterprises with critical applications running on Red Hat platforms.
In our tests, we chose the IBM Db2 Data Warehouse Multi Parallel Processing (MPP) option as complex queries underscore resource utilization (compute, memory, storage and network). It is also one of the solutions offered in IBM Cloud Pak for Data which can consume Red Hat OpenShift Container Storage through IBM Storage Suite for IBM Cloud Paks.
On the journey to make Db2 cloud-native, Red Hat and IBM engineers have collaborated closely to evaluate and verify Red Hat OpenShift Container Storage as a validated software-defined storage solution for Db2. This level of synergy and compliance is exemplified by two major initiatives:
An architectural initiative defines the integration and its associated performance, and reviews how protocols are managed throughout the technology stack.
A practical initiative deploys and tests Db2 and Red Hat OpenShift Container Storage together, validating scenarios that are essential to database workloads.
Big Data Insight (BDI) is an IBM workload that is modeled after a day in the life of a business intelligence application based on a retail database with in-store and online catalogs of merchandise for sale.
The workload is based loosely on the TPC-DS benchmark spec, containing seven fact tables and 17 dimension tables. The data is randomly generated each time a platform is being tested. The workload has three types of “users” to run simple, intermediate and complex queries. The BDI workload contains 100 different Cognos generated queries. In our tests, we generated data for 1TB and 2TB database sizes.
As we were testing Db2WHoC, we chose AWS as the public cloud platform. There were some considerations and calculations that indicated which type of instances we should choose for each node.
Red Hat OpenShift Container Storage 4 uses Red Hat Ceph Storage inside and supports fast devices such as NVMe storage devices. With Red Hat OpenShift Container Storage 4.3 we’ve introduced the ability to use direct attached storage in AWS, so choosing AWS i3en.2xlarge instances for our persistent storage nodes provides us the additional bandwidth required for a Data Warehouse workload.
Also, in order to optimize performance while containing costs, we separated the application layer (where the Db2 pods ran) from the storage layer (the instances that ran OpenShift Container Storage). The configuration consisted of an eight node OpenShift cluster, with one master using one m5.xlarge instance (note: it is recommended to have multiple master nodes for production environments ), four r5a.4xlarge instances for the IBM Db2 pods and three i3en.2xlarge instances for the persistent storage pods.
Running the workload
Red Hat OpenShift Container Storage 4 provides block, object and file options. For our testing, we used CephFS (file) RWX persistent volume claims (PVCs) to provide shared storage within the internal cluster among the four Db2 pods. We also used a CephFS directory to hold the random data required for the test and from this location uploaded the data to the Db2 pods via external tables. Each Db2 pod also uses a block PVC (RWO) as the storage device that actually holds the Db2 data for each database.
The workload first ran in a serial fashion where each query ran once to measure a baseline (time) for each query. This method was used first to warm up the storage, and then ran all queries three more times from start to finish.
Once this was completed, and we had a baseline to compare to, we’ve moved to running concurrent tests using jMeter, that shuffled through the 100 queries and ran concurrent queries with 16 and 32 simulated users, mainly concentrating on the intermediate and complex queries.
To evaluate the performance of IBM Db2 running on OpenShift with OpenShift Container Storage, we compared our test environment to other platforms (both on-premises and cloud) where IBM had run the same workload.
Since the workload size and the compute/memory in the comparative tests were larger, this is not an apples to apples comparison. We thus have normalized the results to provide a meaningful workload characterization. Based on these normalized test results, we can show that IBM Db2 Warehouse running on Red Hat OpenShift with OpenShift Container Storage provides excellent performance. In addition, while some other platforms took considerable time for IBM to configure and even failed some of the test scenarios, Red Hat OpenShift Container Storage was easy to install and took minimal configuration to achieve deterministic performance results.
"With no prior experience with OpenShift Container Storage, our team was able to set up two distinct OpenShift clusters and conduct full Db2 Warehouse Performance validation in less than two weeks. This, by far, exceeded our expectations, as tests involving other Kubernetes storage services proved to be much more difficult to configure and validate.
Additionally, OpenShift Container Storage has outperformed these other cloud-native storage solutions in all tested scenarios when using identical hardware. We have been delighted with the ease of use and outstanding performance of OpenShift Container Storage, prompting us to consider it our preferred data platform for running Db2 Warehouse on OpenShift"
- Piotr Mierzejewski, Director Db2 Development IBM Data & AI
The BDI workload that was tested is one of the key performance testing harnesses used by IBM’s Db2 performance team when testing a storage subsystems.
Our tests demonstrate that Red Hat OpenShift Container Storage provides performance at scale for production Db2 Warehouse MPP workloads running on OpenShift container clusters. Red Hat OpenShift Container Platform scaled predictably across all tested workloads.
For an in depth discussion of our tests and results, please review the workload characterization study at the detail page for IBM Db2 Warehouse MPP on OpenShift Container Storage.
About the authors
Sagy Volkov is a former performance engineer in ScaleIO, he initiated the performance engineering group and the ScaleIO enterprise advocates group, and architected the ScaleIO storage appliance reporting to the CTO/founder of ScaleIO. He is now with Red Hat as a storage performance instigator concentrating on application performance (mainly database and CI/CD pipelines) and application resiliency on Rook/Ceph.
He has spoke previously in Cloud Native Storage day (CNS), DevConf 2020, EMC World and the Red Hat booth in KubeCon.
Loïc Julien is a Senior Technical Staff Member / Databases Deployment in IBM Cloud, Data & AI.