Software Design Pattern Benchmarking

September 22, 20214-minute readHybrid cloud

Preface

When looking into storage performance such as Local Disk/OpenShift Container Storage on a hyper-converged infrastructure or traditional storage, these are the most common types of performance benchmarking:

Applications performance - a mix of different IO operations that can contain a variety of block sizes
Generic performance - a single block size that does a specific operation

Just for clarification, by IO operation, I mean read/write/mix operations with a random/sequential/cache hit IO pattern. Before approaching any type of benchmarking, we must know the capabilities of our hardware first:

Hardware

Disk speed and number of disks, which RAID if exist (RAID penalty)
CPU and RAM CLOCK and architectures
How many lanes and BUS speed which will affect your overall possible bandwidth
BIOS version/DIMM layout should be configured according to vendor best practices.
NIC speeds and latency.

Application

How many CPU cores does it consume while idle/peak/bursts?
Is it NUMA aware (QPI traffic)?
How much RAM does it consume?
How efficient is the application itself?
How much throughput/IOPS does it generate?

Common Example for Hardware and Application Interaction

You just upgraded your hardware to the latest gen CPUs. You got two sockets on board, and your application dropped 30% in performance. Although the sockets are much faster than they were, it is still much more expensive (up to 7x more) to do remote calls to the remote sockets on current latest architectures, so just by pinning your application to a specific NUMA node, you may greatly increase application performance and turn that -30% to a +30% (assuming CPU bottleneck).

Application Modeling

I strongly believe that modeling your application workloads and then scaling them is the best methodology to use, rather than just doing classic performance testing. In this document, I will demonstrate how I modeled different applications workloads using VDbench. That being said, that does not mean that classic workloads do not have their place. It is still a great tool to use when you need to find those corner cases.

Tooling

VDbench, similar to FIO, is a well-known IO generator within the storage community. However, VDbench has a lot of useful features that make it ideal for modeling applications. It also supports a variety of Operating Systems, which makes it a great tool for doing an apples-to-apples comparison on different Operating Systems. It is a free tool, and it can be downloaded here

Databases and Application Pattern Modeling

Different applications generate different block sizes that are running various operations. Therefore, profiling the patterns currently is a key factor in modeling your application. There are various ways to do application profiling, but it is not within the scope of this document. The table below shows the breakdown of the application patterns I am currently using. These are not tied to a specific application, but more of common ground between various databases/applications that I profiled over the years:

Database/App type	Used in	IO SIZE	random/sequential	read/write	percentage
OLTP1	Mail applications Online Transaction Processing	4KB	random	read hit	10
		4KB	random	read	35
		4KB	random	write	35
		4KB	sequential	read	5
		4KB	sequential	write	15
OLTP2	Small Oracle applications Small weight transactions	8KB	random	read hit	20
		8KB	random	read	45
		8KB	random	write	15
		64KB	sequential	read	10
		64KB	sequential	write	10
OLTPHW	Large Oracle applications heavyweight transactions	8KB	random	read hit	10
		8KB	random	read	35
		8KB	random	write	35
		64KB	sequential	read	5
		64KB	sequential	write	15
ODSS2	Data warehouse applications Backup applications	4KB	random	read	15
		4KB	random	write	5
		64KB	sequential	read	70
		64KB	sequential	write	10
ODSS128	Streaming applications Backup applications	64KB	random	read hit	18
		64KB	random	read	18
		64KB	random	write	4
		128KB	sequential	read	48
		128KB	sequential	write	12

Short Disclaimer:
Note that in my modeling, I only used the main block sizes that were used.

For example, for OLTP1, I used five streams of the 4KB block, but in fact when profiling workloads such as OLTP1, you will see that there a lot more block sizes such as 8,16,28,64,120,512 KB and also fractions of blocks such as 0.2,0.44,0.68, KB and others. But the occurrence of those blocks is inconsistent and extremely varied, so to get repeatable consistent results, I am only using the main blocks in play. That also applies to any other application pattern on the table above.

The block sizes I decided to use are the majority of the workload. For example, in OLTP1, 4KB operations accounted for ~ 90% of the workload in total. Real applications will also be compressible and dedupable by x amount, which will also affect your performance, if you use any of those. But that is a completely different topic. I will just point out that VDbench supports compressible and dedupable data generation.

Simple Application Pattern Modeling

Not just databases can be profiled and simulated. Sometimes you will have many users that are using desktop applications, such as Microsoft office. Since it is a simple application, we can also accurately predict the amount of RAM and CPU it will consume on average. In my case, I calculated the following consumption:

1 core per Microsoft Excel instance and 200MiB + (3 * file size) (a user working on 7 MiB file) will consume 1 core and 221 MiB of RAM.

Here is an example of a profiled Microsoft Excel pattern:

Application	Used in	IO SIZE	random/sequential	read/write	percentage
Microsoft Excel	User’s Desktops	52k	random	write	55
		64KB	random	write	40
		6MiB	random	read	5

Config Files Examples

VDbench Databases Config Files for Filesystem

These config files are currently set to run on windows, but to run it on a UNIX/LINUX-based OS, just modify the path on the hd ( Host definition) and fsd flags.

A few nits:

The first test that runs is actually a “fillup,” which fills up the files with random data. That is done for the application patterns to have real data to read.
All application patterns are set to run with threads=1, which for my setup will ensure that queue depth will be low, to yield the lowest latency.

The above workload examples can be found here.

VDbench Databases Config Files for RAW Disk

Note that RAW config files are extremely different from the filesystem config. The above workload examples can be found here.

VDbench Microsoft Excel Config File for Filesystem

Note that the fwdrate annotation (file system operations per second) is set to 1, meaning that I am currently simulating a single user. Also the fsd annotation (filesystem storage definition name) is set to 7, meaning our user is working on 7 different files each is 40MiB in size. The above workload examples can be found here.

VDbench Generic Config Files for Filesystem

A few examples for running generic performance for both blocks specific and mixed workloads can be found here.

FIO Databases Config Files

The above database workloads can also be run with fio. However, note that to replicate the same percentage ratio of different blocks within the patterns, I used the “flow” flag, which is a bit buggy and is currently not working properly. The above workload examples can be found here.

About the author

Boaz Ben Shabat

Browse by channel

Explore all channels