Preface

When looking into storage performance such as  Local Disk/OpenShift Container Storage  on a hyper-converged infrastructure or traditional storage, these are the most common types of performance benchmarking:

  • Applications performance - a mix of different IO operations that can contain a variety of block sizes
  • Generic performance - a single block size that does a specific operation

Just for clarification, by IO operation, I mean read/write/mix operations with a random/sequential/cache hit IO pattern. Before approaching any type of benchmarking, we must know the capabilities of our hardware first:

Hardware

  • Disk speed and number of disks, which RAID if exist (RAID penalty)
  • CPU  and  RAM  CLOCK  and  architectures
  • How many  lanes and  BUS speed which will affect your overall possible bandwidth
  • BIOS version/DIMM layout should be configured according to vendor best practices.
  • NIC speeds and latency.

Application

  • How many CPU cores does it consume while idle/peak/bursts?
  • Is it NUMA aware (QPI traffic)?
  • How much RAM does it consume?
  • How efficient is the application itself?
  • How much throughput/IOPS does it generate?

Common Example for Hardware and  Application Interaction

You just upgraded your hardware to the latest gen CPUs. You got two sockets on board, and your application dropped 30% in performance. Although the sockets are much faster than they were, it is still much more expensive (up to 7x more)  to do remote calls to the remote sockets on current latest architectures, so just by pinning your application to a specific NUMA node, you may greatly increase application performance and turn that -30% to a +30%  (assuming CPU bottleneck).

Application Modeling

I strongly believe that modeling your application workloads and then scaling them is the best methodology to use, rather than just doing classic performance testing. In this document, I will demonstrate how I modeled different applications workloads using VDbench. That being said, that does not mean that classic workloads do not have their place. It is still a great tool to use when you need to find those corner cases.


Tooling

VDbench, similar to FIO, is a well-known IO generator within the storage community. However, VDbench has a lot of useful features that make it ideal for modeling applications. It also supports a variety of Operating Systems, which makes it a great tool for doing an apples-to-apples comparison on different Operating Systems. It is  a free tool, and it can be downloaded here

Databases and Application Pattern Modeling

Different applications generate different block sizes that are running various operations. Therefore, profiling the patterns currently is a key factor in modeling your application. There are various ways to do application profiling, but it is not within the scope of this document. The table below shows the breakdown of the application patterns I am currently using. These are not tied to a specific application, but more of common ground between various databases/applications that I profiled over the years:

Database/App type

Used in

IO SIZE

random/sequential

read/write

percentage

OLTP1

Mail applications
Online Transaction Processing

4KB

random

read hit

10

4KB

random

read

35

4KB

random

write

35

4KB

sequential

read

5

4KB

sequential

write

15

OLTP2

Small Oracle applications

Small weight transactions

8KB

random

read hit

20

8KB

random

read

45

8KB

random

write

15

64KB

sequential

read

10

64KB

sequential

write

10

OLTPHW

Large Oracle applications

heavyweight transactions

8KB

random

read hit

10

8KB

random

read

35

8KB

random

write

35

64KB

sequential

read

5

64KB

sequential

write

15

ODSS2

Data warehouse applications
Backup applications

4KB

random

read

15

4KB

random

write

5

64KB

sequential

read

70

64KB

sequential

write

10

ODSS128

Streaming applications

Backup applications

64KB

random

read hit

18

64KB

random

read

18

64KB

random

write

4

128KB

sequential

read

48

128KB

sequential

write

12

Short Disclaimer:
Note that in my modeling, I only used the main block sizes that were used.

For example,  for OLTP1, I used five streams of the 4KB block, but in fact when profiling workloads such as OLTP1, you will see that there a lot more block sizes such as 8,16,28,64,120,512 KB and also fractions of blocks such as 0.2,0.44,0.68, KB and others. But the occurrence of those blocks is inconsistent and extremely varied, so to get repeatable consistent results, I am  only using the main blocks in play. That also applies to any other application pattern on the table above.

The block sizes I decided to use are the majority of the workload. For example, in OLTP1, 4KB operations accounted  for ~ 90% of the workload in total. Real applications will also be compressible and  dedupable by x amount, which will also affect your performance, if you use any of those. But that is  a completely different topic. I will just point out that VDbench supports compressible and dedupable data generation.

Simple Application Pattern Modeling

Not just databases can be profiled and simulated. Sometimes you will have many users that are using desktop applications, such as Microsoft office. Since it is  a simple application, we can also accurately predict the amount of RAM and  CPU it will consume on average. In my case, I calculated the following consumption:

1 core per Microsoft Excel instance and  200MiB + (3 * file size)  (a user working on 7 MiB  file) will consume 1 core and 221 MiB of RAM.

Here is an example of a profiled Microsoft Excel pattern:

Application

Used in

IO SIZE

random/sequential

read/write

percentage

Microsoft Excel

User’s Desktops

52k

random

write

55

64KB

random

write

40

6MiB

random

read

5


Config Files Examples

VDbench Databases Config Files for Filesystem

These config files are currently set to run on windows, but to run it on a UNIX/LINUX-based OS, just modify the path on the hd ( Host definition) and fsd  flags.

A few nits:

  • The first test that runs is actually a “fillup,” which fills up the files with random data. That is done for the application patterns to have real data to read.
  • All application patterns are set to run with threads=1, which for my setup will ensure that queue depth will be low, to yield the lowest latency.


The above workload examples can be found here.

VDbench Databases Config Files for RAW Disk

Note that RAW config files are extremely different from the filesystem config. The above workload examples can be found here.

VDbench Microsoft Excel Config File for Filesystem

Note that the fwdrate annotation (file system operations per second) is set to 1, meaning that I am currently simulating a single user. Also the fsd annotation (filesystem storage definition name) is set to 7, meaning our user is working on 7 different files each is 40MiB in size. The above workload examples can be found here.

VDbench Generic Config Files for Filesystem

A few examples for running generic performance for both blocks specific and  mixed workloads can be found here.

FIO Databases Config Files

The above database workloads can also be run with fio. However, note that to replicate the same percentage ratio of different blocks within the patterns, I used the “flow” flag, which is a bit buggy and is currently not working properly. The above workload examples can be found here.