Preface
When looking into storage performance such as Local Disk/OpenShift Container Storage on a hyper-converged infrastructure or traditional storage, these are the most common types of performance benchmarking:
- Applications performance - a mix of different IO operations that can contain a variety of block sizes
- Generic performance - a single block size that does a specific operation
Just for clarification, by IO operation, I mean read/write/mix operations with a random/sequential/cache hit IO pattern. Before approaching any type of benchmarking, we must know the capabilities of our hardware first:
Hardware
- Disk speed and number of disks, which RAID if exist (RAID penalty)
- CPU and RAM CLOCK and architectures
- How many lanes and BUS speed which will affect your overall possible bandwidth
- BIOS version/DIMM layout should be configured according to vendor best practices.
- NIC speeds and latency.
Application
- How many CPU cores does it consume while idle/peak/bursts?
- Is it NUMA aware (QPI traffic)?
- How much RAM does it consume?
- How efficient is the application itself?
- How much throughput/IOPS does it generate?
Common Example for Hardware and Application Interaction
You just upgraded your hardware to the latest gen CPUs. You got two sockets on board, and your application dropped 30% in performance. Although the sockets are much faster than they were, it is still much more expensive (up to 7x more) to do remote calls to the remote sockets on current latest architectures, so just by pinning your application to a specific NUMA node, you may greatly increase application performance and turn that -30% to a +30% (assuming CPU bottleneck).
Application Modeling
I strongly believe that modeling your application workloads and then scaling them is the best methodology to use, rather than just doing classic performance testing. In this document, I will demonstrate how I modeled different applications workloads using VDbench. That being said, that does not mean that classic workloads do not have their place. It is still a great tool to use when you need to find those corner cases.
Tooling
VDbench, similar to FIO, is a well-known IO generator within the storage community. However, VDbench has a lot of useful features that make it ideal for modeling applications. It also supports a variety of Operating Systems, which makes it a great tool for doing an apples-to-apples comparison on different Operating Systems. It is a free tool, and it can be downloaded here
Databases and Application Pattern Modeling
Different applications generate different block sizes that are running various operations. Therefore, profiling the patterns currently is a key factor in modeling your application. There are various ways to do application profiling, but it is not within the scope of this document. The table below shows the breakdown of the application patterns I am currently using. These are not tied to a specific application, but more of common ground between various databases/applications that I profiled over the years:
|
Database/App type |
Used in |
IO SIZE |
random/sequential |
read/write |
percentage |
|
OLTP1 |
Mail applications |
4KB |
random |
read hit |
10 |
|
4KB |
random |
read |
35 |
||
|
4KB |
random |
write |
35 |
||
|
4KB |
sequential |
read |
5 |
||
|
4KB |
sequential |
write |
15 |
||
|
OLTP2 |
Small Oracle applications Small weight transactions |
8KB |
random |
read hit |
20 |
|
8KB |
random |
read |
45 |
||
|
8KB |
random |
write |
15 |
||
|
64KB |
sequential |
read |
10 |
||
|
64KB |
sequential |
write |
10 |
||
|
OLTPHW |
Large Oracle applications heavyweight transactions |
8KB |
random |
read hit |
10 |
|
8KB |
random |
read |
35 |
||
|
8KB |
random |
write |
35 |
||
|
64KB |
sequential |
read |
5 |
||
|
64KB |
sequential |
write |
15 |
||
|
ODSS2 |
Data warehouse applications |
4KB |
random |
read |
15 |
|
4KB |
random |
write |
5 |
||
|
64KB |
sequential |
read |
70 |
||
|
64KB |
sequential |
write |
10 |
||
|
ODSS128 |
Streaming applications Backup applications |
64KB |
random |
read hit |
18 |
|
64KB |
random |
read |
18 |
||
|
64KB |
random |
write |
4 |
||
|
128KB |
sequential |
read |
48 |
||
|
128KB |
sequential |
write |
12 |
Short Disclaimer:
Note that in my modeling, I only used the main block sizes that were used.
For example, for OLTP1, I used five streams of the 4KB block, but in fact when profiling workloads such as OLTP1, you will see that there a lot more block sizes such as 8,16,28,64,120,512 KB and also fractions of blocks such as 0.2,0.44,0.68, KB and others. But the occurrence of those blocks is inconsistent and extremely varied, so to get repeatable consistent results, I am only using the main blocks in play. That also applies to any other application pattern on the table above.
The block sizes I decided to use are the majority of the workload. For example, in OLTP1, 4KB operations accounted for ~ 90% of the workload in total. Real applications will also be compressible and dedupable by x amount, which will also affect your performance, if you use any of those. But that is a completely different topic. I will just point out that VDbench supports compressible and dedupable data generation.
Simple Application Pattern Modeling
Not just databases can be profiled and simulated. Sometimes you will have many users that are using desktop applications, such as Microsoft office. Since it is a simple application, we can also accurately predict the amount of RAM and CPU it will consume on average. In my case, I calculated the following consumption:
1 core per Microsoft Excel instance and 200MiB + (3 * file size) (a user working on 7 MiB file) will consume 1 core and 221 MiB of RAM.
Here is an example of a profiled Microsoft Excel pattern:
|
Application |
Used in |
IO SIZE |
random/sequential |
read/write |
percentage |
|
Microsoft Excel |
User’s Desktops |
52k |
random |
write |
55 |
|
64KB |
random |
write |
40 |
||
|
6MiB |
random |
read |
5 |
Config Files Examples
VDbench Databases Config Files for Filesystem
These config files are currently set to run on windows, but to run it on a UNIX/LINUX-based OS, just modify the path on the hd ( Host definition) and fsd flags.
A few nits:
- The first test that runs is actually a “fillup,” which fills up the files with random data. That is done for the application patterns to have real data to read.
- All application patterns are set to run with threads=1, which for my setup will ensure that queue depth will be low, to yield the lowest latency.
The above workload examples can be found here.
VDbench Databases Config Files for RAW Disk
Note that RAW config files are extremely different from the filesystem config. The above workload examples can be found here.
VDbench Microsoft Excel Config File for Filesystem
Note that the fwdrate annotation (file system operations per second) is set to 1, meaning that I am currently simulating a single user. Also the fsd annotation (filesystem storage definition name) is set to 7, meaning our user is working on 7 different files each is 40MiB in size. The above workload examples can be found here.
VDbench Generic Config Files for Filesystem
A few examples for running generic performance for both blocks specific and mixed workloads can be found here.
FIO Databases Config Files
The above database workloads can also be run with fio. However, note that to replicate the same percentage ratio of different blocks within the patterns, I used the “flow” flag, which is a bit buggy and is currently not working properly. The above workload examples can be found here.
저자 소개
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래