Customers are always looking to gain performance improvements from their servers. One area of performance bottleneck has always been the speed of disk access. Until fairly recently, servers have usually been configured with banks of hard disk drives or attached to Storage Area Networks (SANs) which themselves are huge banks of hard drives. Solid State Drives (and NVMe devices) offer better performance for most users than spindle based hard disk drives. However, SSDs and NVMe devices are considerably more expensive in comparison.
One solution to providing improved disk I/O performance would be to combine the capacity offered by spindle based HDDs with the speed of access offered by SSDs. Some storage vendors sell hybrid drives combining these two storage technologies. It is possible to achieve the same solution in Red Hat Enterprise Linux by configuring an SSD to act as a cache device for a larger HDD. This has the added benefit of allowing you to choose your storage vendor without relying on their cache implementation. As SSD prices drop and capacities increase, the cache devices can be replaced without worrying about the underlying data devices.
A supported solution in Red Hat Enterprise Linux is to use a dm-cache device. Since this is part of devicemapper, we don’t need to worry about kernel modules and kernel configuration options, and no tuning has been necessary for the tests performed.
Read more about optimizing performance for the open-hybrid enterprise.
Read more about optimizing performance for the open-hybrid enterprise.
However, it is worth knowing that dm-cache has been engineered to target particular use cases - it is a ‘hot-spot’ cache and is slow filling. This design choice means that data will be promoted to the cache over multiple accesses and population of the cache will be slow. As such data streams will not be cached and random access will also not be helped. Likewise, where files are created and destroyed on a frequent basis, dm-cache will not likely be of benefit.
This behavior is in contrast to the more familiar kernel filesystem cache which uses physical RAM to cache file access. The kernel filesystem cache will be populated quickly, but it is also more volatile, and cannot be targeted at a specific volume in the manner that dm-cache can be.
Setting up the performance testing environment
Given these criteria for use cases, testing will not use dd, will focus on read speeds, and will require multiple runs before significant performance benefits can be realized. My hardware for testing has been a PC with three storage devices present:
-
120GB mSATA 'disk' (/dev/sdc). This is where the OS has been installed, and for the purposes of testing is a 'fast' disk offering similar speeds to an SSD.
-
500GB SATA 2.5" HDD (/dev/sda). This is my 'slow' disk and will be the target location for data to be read from.
-
130GB SATA 2.5" SSD (/dev/sdb). This is my 'fast' disk and will be used as my cache device.
Red Hat Enterprise Linux 7.3 has been installed on the mSATA disk (identified as /dev/sdc) with the filesystem formatted as xfs. I have then created a single partition on the 500GB HDD, and created an LVM volume group called data, with a single logical volume of 500GB called 'slowdisk'. The logical volume has then been formatted as xfs.
I will create a 2.5GB file on the slowdisk logical volume, and for testing will measure the time it takes to copy this file from the slowdisk to the root filesystem (this is hosted on the 'fast' mSATA disk, and it is therefore reasonable to assume that the time overhead will be predominantly caused by the read from the slow disk).
As can be seen, the logical volumes are as follows:
[root@rhel-test ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert slowdisk data -wi-ao---- 400.00g home rhel -wi-ao---- 60.29g root rhel -wi-ao---- 50.00g swap rhel -wi-ao---- 7.75g
Above is a diagram depicting the filesystem configuration.
First Test
The first test is to copy the file from the slow disk to the root filesystem (hosted on the fast mSATA drive).
[root@rhel-test ~]# echo 3 > /proc/sys/vm/drop_caches && time cp /testing/2.5GB.testfile /root/ real 0m21.464s user 0m0.018s sys 0m1.719s
From the above, the kernel file cache has been cleared prior to timing the copy of the test file. As was noted earlier, the Linux kernel file cache can make file operations appear much faster than the underlying disks can actually perform. Given that this article is specifically focussed on testing the performance of the underlying disks it is important to drop the file cache prior to running each iteration of the test.
It is also recommended to repeat each test multiple times - for my testing, I repeated the test five times and have taken mean values of the copy times.
Slow disk to fast disk mean copy time:
real 21.467 seconds user 0.0144 seconds sys 1.7394 seconds
These values represent the baseline performance. Hopefully, by putting a cache disk in place, these values can be improved.
Setting up the cache
The 120GB SSD is added to the data volume group, and a cachedisk logical volume created, as well as a smaller metadata volume. Instructions for creating the cache volumes are at the Red Hat Customer Portal:
[root@rhel-test ~]# lvcreate -L 100G -n cachedisk data /dev/sdb1 Logical volume "cachedisk" created. [root@rhel-test ~]# lvcreate -L 4G -n metadisk data /dev/sdb1 Logical volume "metadisk" created.
The cache device is then added to the slowdisk logical volume.
[root@rhel-test ~]# lvconvert --type cache-pool /dev/data/cachedisk --poolmetadata /dev/data/metadisk Using 128.00 KiB chunk size instead of default 64.00 KiB, so cache pool has less then 1000000 chunks. WARNING: Converting logical volume data/cachedisk and data/metadisk to cache pool's data and metadata volumes with metadata wiping. THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.) Do you really want to convert data/cachedisk and data/metadisk? [y/n]: y Converted data/cachedisk to cache pool. [root@rhel-test ~]# lvconvert --type cache /dev/data/slowdisk --cachepool /dev/data/cachedisk Do you want wipe existing metadata of cache pool volume data/cachedisk? [y/n]: y Logical volume data/slowdisk is now cached.
The logical volume configuration can be checked using the lvs command:
[root@rhel-test ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [cachedisk] data Cwi---C--- 100.00g 0.00 0.16 0.00 [cachedisk_cdata] data Cwi-ao---- 100.00g [cachedisk_cmeta] data ewi-ao---- 4.00g [lvol0_pmspare] data ewi------- 4.00g slowdisk data Cwi-aoC--- 400.00g [cachedisk] [slowdisk_corig] 0.00 0.16 0.00 [slowdisk_corig] data owi-aoC--- 400.00g home rhel -wi-ao---- 60.29g root rhel -wi-ao---- 50.00g swap rhel -wi-ao---- 7.75g
From the above output, the slowdisk logical volume has a cache, on the logical volume cachedisk, added. The utilization of the cache is currently at 0.00% (a small amount of metadata has already been created).
The /dev/sdc device that hosts the root filesystem has been removed from this picture but remains the same as the previous diagram.
First cache test
With the cache in place, the tests can be re-run:
[root@rhel-test ~]# echo 3 > /proc/sys/vm/drop_caches && time cp /testing/2.5GB.testfile /root/ real 0m21.560s user 0m0.009s sys 0m1.804s
“These numbers are rubbish! There’s no improvement by using the cache device, and I’ve wasted my money buying this expensive SSD!”
This is the first run since adding the cache device. There is currently no data cached, and it is therefore normal and expected that there would be no performance improvement. The test needs to be repeated.
Second cache test
[root@rhel-test ~]# echo 3 > /proc/sys/vm/drop_caches && time cp /testing/2.5GB.testfile /root/ real 0m22.312s user 0m0.009s sys 0m1.607s
“You’re joking?! The performance has actually dropped! This ‘caching’ just doesn’t work!”
Initially, the figures don’t look very promising at all. This second run has taken slightly longer than the first run. Looking at the logical volume properties:
[root@rhel-test ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [cachedisk] data Cwi---C--- 100.00g 0.15 0.16 0.00 [cachedisk_cdata] data Cwi-ao---- 100.00g [cachedisk_cmeta] data ewi-ao---- 4.00g [lvol0_pmspare] data ewi------- 4.00g slowdisk data Cwi-aoC--- 400.00g [cachedisk] [slowdisk_corig] 0.15 0.16 0.00 [slowdisk_corig] data owi-aoC--- 400.00g home rhel -wi-ao---- 60.29g root rhel -wi-ao---- 50.00g swap rhel -wi-ao---- 7.75g
Although the copy job has taken slightly longer, we can see that the cachedisk is now beginning to be utilized. Only 0.15% utilized, which equates to 150MB of cache has been used. Given we are copying 2.5GB of data each time, the entire data that is being accessed has certainly not been promoted to the cache.
This, again, is by design. dm-cache has been designed as a hot-spot cache with a targeting towards read caching. The hot-spot cache will build up somewhat slowly over time and promote the frequently accessed data to the cache. It won’t fill up quickly with recently accessed data. This behavior means that there should be less ‘cache thrashing’ of items being regularly added and dropped from the cache. Greater long term performance benefits can be had with this behavior.
However, for our testing purposes, this means we must rerun the tests several times to populate the cache to finally determine the measurement of performance benefit.
Third cache test
[root@rhel-test ~]# echo 3 > /proc/sys/vm/drop_caches && time cp /testing/2.5GB.testfile /root/ real 0m23.641s user 0m0.006s sys 0m1.614s
“Still slow, more doubts are beginning to creep in … Patience and persistence will be rewarded though.”
Fourth cache test
[root@rhel-test ~]# echo 3 > /proc/sys/vm/drop_caches && time cp /testing/2.5GB.testfile /root/ real 0m22.452s user 0m0.004s sys 0m1.620s
After four runs with the cache device, performance is back to where we started. However, the cache device is increasing in utilization. Hopefully, performance payback is just around the corner.
[root@rhel-test ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [cachedisk] data Cwi---C--- 100.00g 0.43 0.16 0.00 [cachedisk_cdata] data Cwi-ao---- 100.00g [cachedisk_cmeta] data ewi-ao---- 4.00g [lvol0_pmspare] data ewi------- 4.00g slowdisk data Cwi-aoC--- 400.00g [cachedisk] [slowdisk_corig] 0.43 0.16 0.00 [slowdisk_corig] data owi-aoC--- 400.00g home rhel -wi-ao---- 60.29g root rhel -wi-ao---- 50.00g swap rhel -wi-ao---- 7.75g
Fifth Cache Test
[root@rhel-test ~]# echo 3 > /proc/sys/vm/drop_caches && time cp /testing/2.5GB.testfile /root/ real 0m20.279s user 0m0.004s sys 0m1.605s
The fastest copy time yet! Albeit only approximately 1 second faster than with no cache at all. There should be room for further improvement though, as the cache statistics show:
Sixth Test
[root@rhel-test ~]# echo 3 > /proc/sys/vm/drop_caches && time cp /testing/2.5GB.testfile /root/ real 0m19.778s user 0m0.005s sys 0m1.549s
More time has been shaved off the copy job. At this point, we’ll skip ahead … many test runs later
Table of results
Test Run Number |
Mean (real) copy time (s) |
---|---|
1 |
21.47 |
2 |
21.53 |
... |
... |
13 |
14.80 |
... |
... |
21 |
5.80 |
22 |
5.02 |
23 |
5.02 |
Table of mean time spent in real mode during file copy operation
Test Run Number 1 was performed with no cache device configure. Test Run Number 2 and subsequent tests were performed with the cache device present.
Above is a graph depicting how the copy time decreased as more iterations of the test were run.
After running the test 22 times, no performance improvement was found with further iterations of the test. From the output of lvs, it is obvious that the cache has now been filled with 2.5GB of data (the file that has been copied across 20+ times).
[root@rhel-test ~]# lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [cachedisk] data Cwi---C--- 100.00g 2.50 0.16 0.00 [cachedisk_cdata] data Cwi-ao---- 100.00g [cachedisk_cmeta] data ewi-ao---- 4.00g [lvol0_pmspare] data ewi------- 4.00g slowdisk data Cwi-aoC--- 400.00g [cachedisk] [slowdisk_corig] 2.50 0.16 0.00 [slowdisk_corig] data owi-aoC--- 400.00g home rhel -wi-ao---- 60.29g root rhel -wi-ao---- 50.00g swap rhel -wi-ao---- 7.75g
swap rhel -wi-ao---- 7.75g
Graph depicting how the cache utilization increased as more test runs were performed.
Conclusions and summary
From the results above it has been demonstrated that implementing dm-cache on a fast device in front of a larger and slower disk can provide significant performance gains, given specific use cases. Data that is frequently read will be promoted to the cache which the tests have demonstrated can provide a significant increase in read performance. If data is read only once, dm-cache does not offer any improvement.
Using dm-cache can provide performance benefit for file accesses on a server, but does not replace the kernel file caching, and does not provide a good use case for random file access. It excels as a read cache where frequently accessed files (hot-spots) can be promoted to the cache over a period of multiple accesses (slow fill cache). Once the cache is populated, the read performance should increase.
The Linux kernel file cache generally will perform considerably faster than an SSD (or NVMe) based dm-cache device - physical RAM is still significantly faster than solid state drives. However, an SSD based dm-cache can survive a server reboot and should not be as ephemeral as the Linux kernel file cache. The Linux kernel also frees in-memory cache as processes demand memory allocations, whereas a device backed dm-cache provides a defined cache capacity.
References
The inspiration for this blog entry came from a customer case and research led to a three-year-old post that Richard Jones made on his blog while he tried to optimise the performance of his virtual machines and discussions with dm-cache developers he had on the linux-lvm mailing list.
It is worth noting that, as always, software has moved on and the tuning parameters that Richard was advised to try to alter the dm-cache characteristics have now mostly been deprecated. This has been caused by the change in the cache policy used by dm-cache making it much simpler for use by removing these parameters.
A more recent discussion, this time on the dm-devel mailing list, details the newer cache policy (smq) and also the tuning options available. For the tests that I have performed in putting together this blog, I didn’t alter any of the default settings for dm-cache. I have also displayed the cache utilization by simply using the output of the ‘lvs -a’ command, however, there are other tools and scripts available that people have put together. One such example that I found helpful was created by Armin Hammer.
Jonathan Ervine is a TAM from Hong Kong. He is providing support to enterprise customers in the financial, logistics, and technology sectors in the APAC region. Recently, Jonathan has been helping his customers deploy private cloud infrastructure and maintaining their existing platform deployments on a supported platform. More about Jonathan.
A Red Hat Technical Account Manager (TAM) is a specialized product expert who works collaboratively with IT organizations to strategically plan for successful deployments and help realize optimal performance and growth. The TAM is part of Red Hat’s world class Customer Experience and Engagement organization and provides proactive advice and guidance to help you identify and address potential problems before they occur. Should a problem arise, your TAM will own the issue and engage the best resources to resolve it as quickly as possible with minimal disruption to your business.
Connect with TAMs at a Red Hat Convergence event near you! Red Hat Convergence is a free, invitation-only event offering technical users an opportunity to deepen their Red Hat product knowledge and discover new ways to apply open source technology to meet their business goals. These events travel to cities around the world to provide you with a convenient, local one-day experience to learn and connect with Red Hat experts and industry peers.
저자 소개
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.