SSDs are now commonplace and have been the default choice for performance-oriented disks in the enterprise and consumer environments for the past few years. SSDs are cool and fast but most people on high-end machines face this dilemma: My SSD is behind a RAID controller which doesn't expose the device's DISCARD or TRIM capabilities. How do I discard the blocks to keep the best SSD performance? Here's a trick to do just that without having to disassemble your machine. Recent improvements in SSD firmware have made the need for the applications writing to SSDs less stringent to use DISCARD/TRIM.
There are, however, some cases in which you may need to have the filesystem inform the drive of the blocks which it discarded. Perhaps you have TLC (3bits per cell) or QLC (4bits per cell) drives instead of the usually more expensive enterprise-class SLC or MLC drives (the latter are less susceptible to a performance drop since they put aside more extra blocks to help with overwrites when the drive is at capacity). Or maybe you once filled your SSD to 100%, and now you cannot get the original performance/IOPS back.
On most systems, getting the performance back is usually a simple matter of issuing a filesystem trim (fstrim) command. Here's an example using a Red Hat Enterprise Linux (RHEL) system:
[root@System_A ~]# fstrim -av
/export/home: 130.5 GiB (140062863360 bytes) trimmed
/var: 26.1 GiB (28062511104 bytes) trimmed
/opt: 17.6 GiB (18832797696 bytes) trimmed
/export/shared: 31.6 GiB (33946275840 bytes) trimmed
/usr/local: 5.6 GiB (5959331840 bytes) trimmed
/boot: 678.6 MiB (711565312 bytes) trimmed
/usr: 36.2 GiB (38831017984 bytes) trimmed
/: 3 GiB (3197743104 bytes) trimmed
[root@System_A ~]#
[ Readers also liked: Linux hardware: Converting to solid-state disks (SSDs) on the desktop ]
There's one catch, though...
If your SSDs are behind a RAID volume attached to a RAID controller (HPE's SmartArray, Dell's PERC, or anything based on LSI/Avago's MegaRAID), here's what happens:
[root@System_B ~]# fstrim -av
[root@System_B ~]#
Just nothing. Nothing will happen. At the end of the SCSI I/O chain, the capabilities of a device boil down to the device itself, and the RAID driver your drive is attached to.
Let's take a closer look. Here's an SSD (a Samsung EVO 860 2Tb drive) attached to a SATA connector on a RHEL system (we will name that system System_A in the rest of this document):
[root@System_A ~]# lsscsi
[3:0:0:0] disk ATA Samsung SSD 860 3B6Q /dev/sda
Here's an identical drive (same model, same firmware) behind a RAID controller (a PERC H730P) on a different system (let's call that system System_B in the rest of this document):
[root@System_B ~]# lsscsi
[0:2:0:0] disk DELL PERC H730P Adp 4.30 /dev/sda
How do I know it's the same drive? Thanks to the use of megaclisas-status, the RAID HBA can be queried. It shows this:
[root@System_B ~]# megaclisas-status
-- Controller information --
-- ID | H/W Model | RAM | Temp | BBU | Firmware
c0 | PERC H730P Adapter | 2048MB | 60C | Good | FW: 25.5.7.0005
-- Array information --
-- ID | Type | Size | Strpsz | Flags | DskCache | Status | OS Path | CacheCade |InProgress
c0u0 | RAID-0 | 1818G | 512 KB | ADRA,WB | Enabled | Optimal | /dev/sda | None |None
-- Disk information --
-- ID | Type | Drive Model | Size | Status | Speed | Temp | Slot ID | LSI ID
c0u0p0 | SSD | S3YUNB0KC09340D Samsung SSD 860 EVO 2TB RVT03B6Q | 1.818 TB | Online, Spun Up | 6.0Gb/s | 23C | [32:0] | 0
Yes, it's the same drive (Samsung EVO 860) and the same firmware (3B6Q).
Using lsblk, we'll expose the DISCARD capabilities of those two devices:
[root@System_A ~]# lsblk -dD
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 512B 2G 1
[root@System_B ~]# lsblk -dD
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 0B 0B 0
Here's the culprit. All of the values are zero. The SSD in a RAID 0 behind a PERC H730P on System_B does not expose any DISCARD capabilities. This is the reason why fstrim on System_B did not do or return anything.
HPQ SmartArray systems are affected in a similar way. Here's an HPE DL360Gen10 with a high-end SmartArray RAID card:
[root@dl360gen10 ~]# lsblk -dD
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda 0 0B 0B 0
sdc 0 0B 0B 0
sdd 0 0B 0B 0
sde 0 0B 0B 0
sdf 0 0B 0B 0
sdg 0 0B 0B 0
sdh 0 0B 0B 0
All LSI-based (megaraid_sas driver) and SmartArray-based (hpsa driver) systems suffer from this problem. If you want to TRIM your SSDs, you would have to shutdown System_B, pull the drive out, connect it to a SAS/SATA-capable system, and fstrim there.
Fortunately for us, there's a small trick to temporarily expose the native capabilities of your device and TRIM it. This requires taking down the application that uses your RAID drive, but at least it does not require you to walk to a DataCenter to pull some hardware out of a system.
The trick is to stop using the RAID drive through the RAID driver, expose the SSD as a JBOD, re-mount the filesystem, and then TRIM it there. Once it DISCARDs the blocks, simply put the drive back in RAID mode, mount the filesystem, and then restart your applications.
There are a couple of caveats:
- The RAID hardware you are using must allow devices to be put in JBOD mode.
- You cannot do this on your boot disk as it would require taking down the OS.
Walking through the process
Here is a small walk-through created on a system with a Dell PERC H730P and a Samsung SSD. We'll call this system System_C.
1) The SSD is at [32:2] on HBA a0, and we'll create a single RAID 0 drive from it:
[root@System_C ~]# MegaCli -CfgLdAdd -r0 [32:2] WB RA CACHED -strpsz 512 -a0
2) The new logical drive pops up as /dev/sdd and shows no DISCARD capabilities:
[root@System_C ~]# lsblk -dD
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
[....]
sdd 0 0B 0B 0
3) Next, create a volume group (VG), a volume, and a 128G filesystem on top of that device:
[root@System_C ~]# parted /dev/sdd
[root@System_C ~]# pvcreate /dev/sdd1
[root@System_C ~]# vgcreate testdg /dev/sdd1
[root@System_C ~]# lvcreate -L 128G -n lv_test testdg
[root@System_C ~]# mount /dev/testdg/lv_test /mnt
[root@System_C ~]# mke2fs -t ext4 /dev/testdg/lv_test
[root@System_C ~]# mount /dev/testdg/lv_test /mnt
For the sake of this demonstration, we'll copy some data to /mnt.
4) Stop using the system and export the volume group:
[root@System_C ~]# umount /mnt
[root@System_C ~]# vgchange -a n testdg
0 logical volume(s) in volume group "testdg" now active
[root@System_C ~]# vgexport testdg
Volume group "testdg" successfully exported
5) Enable JBOD mode on the HBA:
[root@System_C ~]# MegaCli -AdpSetProp -EnableJBOD -1 -a0
Adapter 0: Set JBOD to Enable success.
Exit Code: 0x00
6) Delete the logical drive and make the drive JBOD. On most RAID controllers, safety checks prevent you from creating a JBOD with a drive that is part of a logical volume:
[root@System_C ~]# MegaCli -PDMakeJBOD -PhysDrv[32:2] -a0
Adapter: 0: Failed to change PD state at EnclId-32 SlotId-2.
Exit Code: 0x01
The solution here is to delete the logical volume. This is a simple logical operation, and it will not touch our data. However, you must have written down the command used to create the RAID 0 array in the first place.
[root@System_C ~]# MegaCli -CfgLdDel -L3 -a0
Adapter 0: Deleted Virtual Drive-3(target id-3)
Exit Code: 0x00
[root@System_C ~]# MegaCli -PDMakeJBOD -PhysDrv[32:2] -a0
Adapter: 0: EnclId-32 SlotId-2 state changed to JBOD.
Exit Code: 0x00
7) Refresh the kernel's view of the disks and import your data:
[root@System_C ~]# partprobe
[root@System_C ~]# vgscan
Reading volume groups from cache.
Found exported volume group "testdg" using metadata type lvm2
Found volume group "rootdg" using metadata type lvm2
[root@System_C ~]# vgimport testdg
Volume group "testdg" successfully imported
[root@System_C ~]# vgchange -a y testdg
1 logical volume(s) in volume group "testdg" now active
[root@System_C ~]# mount /dev/testdg/lv_test /mnt
[root@System_C ~]# fstrim -v /mnt
/mnt: 125.5 GiB (134734139392 bytes) trimmed
We have discarded the empty blocks on our filesystem. Let's put it back in a RAID 0 logical drive.
8) umount the filesystem and export the volume group:
[root@System_C ~]# umount /mnt
[root@System_C ~]# vgchange -a n testdg
0 logical volume(s) in volume group "testdg" now active
[root@System_C ~]# vgexport testdg
Volume group "testdg" successfully exported
9) Disable JBOD mode on the RAID controller:
[root@System_C ~]# MegaCli -AdpSetProp -EnableJBOD -0 -a0
Adapter 0: Set JBOD to Disable success.
Exit Code: 0x00
10) Re-create your logical drive:
[root@System_C ~]# MegaCli -CfgLdAdd -r0 [32:2] WB RA CACHED -strpsz 512 -a0
11) Ask the kernel to probe the disks and re-mount your filesystem:
[root@System_C ~]# partprobe
[root@System_C ~]# vgscan
Reading volume groups from cache.
Found exported volume group "testdg" using metadata type lvm2
Found volume group "rootdg" using metadata type lvm2
[root@System_C ~]# vgimport testdg
Volume group "testdg" successfully imported
[root@System_C ~]# vgchange -a y testdg
1 logical volume(s) in volume group "testdg" now active
[root@System_C ~]# mount /dev/testdg/lv_test /mnt
Your data should be there, and the performance of your SSD should be back to its original figures.
[ Free online course: Red Hat Enterprise Linux technical overview. ]
Wrapping up
Here are a few additional notes:
- This procedure should be taken with a grain of salt and with a large warning: DO NOT perform this unless you are confident you can identify logical drives and JBODs on a Linux system.
- I have only tested this procedure using RAID 0 logical drives. It seems unlikely that it would function for other types of RAID (5, 6, 1+0, etc.) as the structure of the filesystem will be hidden from the Linux OS.
- Please do not perform this procedure without verified backups.
저자 소개
Vincent Cojot is a repentless geek. When he's not fixing someone else's OpenStack or RHEL environment, he can be seen trying to squeeze the most efficent use of resources out of server hardware. After a career in the banking and telco industries, he thinks there's nothing as much fun as working for an Open-Source software company.
유사한 검색 결과
Behind the scenes of RHEL 10, part 3
Alliander modernises its electricity grid with Red Hat for long-term reliability in balance with rapid innovation
The Overlooked Operating System | Compiler: Stack/Unstuck
Linux, Shadowman, And Open Source Spirit | Compiler
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래