Continuing with our multi-part topic, we again find ourselves deep in the dark realm of storage subsystems. Once again, we will be venturing below the application layer and into the filesystem where all your data resides. This time, we’ll again be digging into the land of Red Hat Enterprise Linux software raid (mdraid) and looking at Raid 5 expansion with an XFS filesystem.
If you did not catch our prior foray into these areas, check out Expanding the capacity of a Red Hat Enterprise Linux MD raid part 2.
At some point during the course of managing and/or re-configuring backing storage devices, capacities can change. Generally speaking, in most cases, the capacity increases. This can happen for a variety of reasons:
An HW raid volume drive upgrade/replacement with larger drive(s)
Example: Online capacity expansion of a LSI Megaraid Raid1 volume
Expansion of a remote ISCSI, AOE or FIBRE Channel block device
An underlying block or file backed storage device attached to a Virtual Machine has been expanded
The contents of a device which is part of an existing mdraid device has been cloned to, and replaced with a larger device
Some other scenario not listed here where a block storage device has been grown in size
Any of the latter block storage devices can be used with Red Hat Enterprise Linux software raid (mdraid). Generally, the configuration of a mdraid device is commonly regarded as somewhat static in nature. Most of the time mdraid devices are configured from underlying storage devices and are sent on their way to become a physical member (PV) of a Logical Volume Manager group (VG). Device expansion, specifically with the likes of LVM, is relatively easy. However, with mdraid, capacity expansion is not generally a topic covered very well, nor is it especially well documented. That said, with a bit of black magic, nerves of steel and a steady hand growing your mdraid is possible. So let's go ahead and dig in using a virtual machine for today’s example.
Before we begin, here are a few disclaimers:
Expanding a mdraid will always involve the filesystem/mdraid device being taken offline.
As with any potentially risky change to a storage subsystem, ensure you have appropriate backups of your data!
Today’s scenario finds us wanting to expand a md raid5 array formatted with an XFS filesystem.
Grow a MD Raid5 array consisting of five devices with an xfs filesystem.
First we need to find out about the block devices present in the system:
# lsscsi --size [0:0:0:0] cd/dvd ATA CD-ROM 1.0 /dev/sr0 - [1:0:0:0] disk ATA HARDDISK 1.0 /dev/sda 8.58GB [2:0:0:0] disk ATA HARDDISK 1.0 /dev/sdb 1.07GB [3:0:0:0] disk ATA HARDDISK 1.0 /dev/sdc 1.07GB [4:0:0:0] disk ATA HARDDISK 1.0 /dev/sdd 1.07GB [5:0:0:0] disk ATA HARDDISK 1.0 /dev/sde 1.07GB [5:0:0:0] disk ATA HARDDISK 1.0 /dev/sdf 1.07GB
It looks like /dev/sdb, /dev/sdc, /dev/sdd, and /dev/sde would be good candidates to use, but let's check first and make sure they are not mounted and being used.
# mount | grep sd[abcdef] /dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered)
After checking we see that /dev/sdb, /dev/sdc, /dev/sdd, and /dev/sde are free and clear to use, so we can go ahead and create our raid5 array with them.
# mdadm --create /dev/md127 --name=foo --level=raid5 --raid-devices=5 --chunk=128K /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md127 started.
Now that we have created a new array, we need to create or check for the presence of /etc/mdadm.conf. By default, mdadm does not write out a configuration file (/etc/mdadm.conf). It instead relies upon scanning /proc/partitions, device superblock/bitmaps data and /proc/mdstat to assemble any arrays which are found.
First we need to check to see if /etc/mdadm.conf already exists:
# ls -l /etc/mdadm.conf
If the file does not exist, we need to create it and capture the mdraid configuration into /etc/mdadm.conf and verify its contents. (*NOTE* this command will overwrite an existing conf)
# mdadm --detail --scan /dev/md127 > /etc/mdadm.conf && cat /etc/mdadm.conf ARRAY /dev/md127 metadata=1.2 name=fclive:foo UUID=6e707d66:41b19db2:b1439fad:fd56e4fd
If the file already exists, we need to append to the mdraid configuration for our newly created array into /etc/mdadm.conf and verify its contents.
# mdadm --detail --scan /dev/md127 >> /etc/mdadm.conf && cat /etc/mdadm.conf ARRAY /dev/md0 metadata=1.2 name=fclive:sys UUID=17771f82:63a78d76:a7badcd1:bf133289 ARRAY /dev/md127 metadata=1.2 name=fclive:foo UUID=6e707d66:41b19db2:b1439fad:fd56e4fd
Now that we have our config file created, we can check how much disk space is seen and is being utilized by the array.conf
# mdadm --detail /dev/md127 | grep 'Array Size' Array Size : 4190208 (4.00 GiB 4.29 GB)
Now we can go ahead and create an optimally aligned Redhat XFS filesystem on the mdraid device /dev/md127. Items to note on XFS alignment with mdraid or hardware raid devices:
XFS 'su' is used to specify the stripe unit for a RAID device or a striped logical volume (RAID stripe size)
XFS 'sw' is used to specify the stripe width and is a multiplier of the stripe unit, usually the same as the number of stripe members in the logical volume configuration, or data disks in a RAID device.
Thus given the raid level, array geometry and device count in our example we end up with:
Array chunk size = 128K so XFS 'su=128k'
Number of drives in the array is 5 so XFS 'sw=4' (parity raid5 so numdrives - 1)
We can now go ahead and create the filesystem on /dev/md127 using the optimal "su" and "sw" values that we determined.
# mkfs.xfs -L foo -d su=128k,sw=4 /dev/md127 meta-data=/dev/md127 isize=512 agcount=8, agsize=130912 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 data = bsize=4096 blocks=1047296, imaxpct=25 = sunit=32 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=32 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
Now we want to mount the XFS file system at /foo and check out the size.
# mkdir /foo # mount /dev/md127 /foo && df -kh | grep md127 /dev/md127 4.0G 37M 4.0G 1% /foo
Everything looks good so let's go ahead and unmount the filesystem.
# umount /foo
We are now ready to grow the underlying storage so we need to stop and remove the mdraid device /dev/md127.
# mdadm --stop /dev/md127 && mdadm --remove /dev/md127
Now that the array is offline, we can expand the associated virtual block devices. In almost all cases this operation occurs offline and outside the control of the guest operating system.
Our example VM is utilizing the qemu/libvirt virtualized SATA controller, so we can simply hot-plug the virtual SATA block devices to grow them.
Referring back to the lsscsi output from earlier, we see that /dev/sdb, /dev/sdc, dev/sdd, /dev/sde which made up /dev/md127 are attached to SCSI_HOST devices 2:0:0:0, 3:0:0:0, 4:0:0:0 and 5:0:0:0
[root@fclive ~]# lsscsi --size [0:0:0:0] cd/dvd ATA CD-ROM 1.0 /dev/sr0 - [1:0:0:0] disk ATA HARDDISK 1.0 /dev/sda 8.58GB [2:0:0:0] disk ATA HARDDISK 1.0 /dev/sdb 1.07GB [3:0:0:0] disk ATA HARDDISK 1.0 /dev/sdc 1.07GB [4:0:0:0] disk ATA HARDDISK 1.0 /dev/sdd 1.07GB [5:0:0:0] disk ATA HARDDISK 1.0 /dev/sde 1.07GB
At this point, we need to hot-plug our virtual SATA block devices so we can grow them.
First we need to inform the kernel to delete the SATA devices at SCSI_HOST address 2:0:0:0, 3:0:0:0, 4:0:0:0 and 5:0:0:0.
# echo 1 > /sys/class/scsi_device/2\:0\:0\:0/device/delete # echo 1 > /sys/class/scsi_device/3\:0\:0\:0/device/delete # echo 1 > /sys/class/scsi_device/4\:0\:0\:0/device/delete # echo 1 > /sys/class/scsi_device/5\:0\:0\:0/device/delete # echo 1 > /sys/class/scsi_device/6\:0\:0\:0/device/delete
Check to make sure the SATA devices that were at SCSI_HOST address 2:0:0:0, 3:0:0:0, 4:0:0:0, 5:0:0:0 and 6:0:0:0 are gone.
# lsscsi --size [0:0:0:0] cd/dvd ATA CD-ROM 1.0 /dev/sr0 - [1:0:0:0] disk ATA HARDDISK 1.0 /dev/sda 8.58GB
You can proceed to grow the block devices. I won't cover all the possible methods and technologies here (refer back to the brief list of examples above).
With the wave of a hand magically the devices have now grown larger...
The virtual block devices have now been expanded in capacity and it is time to bring them back online. After hot-plugging the devices back onto the VM, the guest OS should pick up on the change in sizes and geometry. If the guest OS does not pick up on the latter we will need to force a rescan.
If you need to force a rescan, one of two methods can be employed:
Method 1 (This method will scan for for new devices):
# echo "- - -" > /sys/class/scsi_host/host2/scan # echo "- - -" > /sys/class/scsi_host/host3/scan # echo "- - -" > /sys/class/scsi_host/host4/scan # echo "- - -" > /sys/class/scsi_host/host5/scan # echo "- - -" > /sys/class/scsi_host/host6/scan
Method 2 (This method will scan for existing and/or missing devices):
# echo 1 > /sys/class/scsi_device/2\:0\:0\:0/device/rescan # echo 1 > /sys/class/scsi_device/3\:0\:0\:0/device/rescan # echo 1 > /sys/class/scsi_device/4\:0\:0\:0/device/rescan # echo 1 > /sys/class/scsi_device/5\:0\:0\:0/device/rescan # echo 1 > /sys/class/scsi_device/6\:0\:0\:0/device/rescan
We should now see our virtual SATA block devices re-attached as /dev/sdb, /dev/sdc, dev/sdd and /dev/sde except that they have now each grown another 1GB in size.
# lsscsi --size [0:0:0:0] cd/dvd ATA CD-ROM 1.0 /dev/sr0 - [1:0:0:0] disk ATA HARDDISK 1.0 /dev/sda 8.58GB [2:0:0:0] disk ATA HARDDISK 1.0 /dev/sdb 2.09GB [3:0:0:0] disk ATA HARDDISK 1.0 /dev/sdc 2.09GB [4:0:0:0] disk ATA HARDDISK 1.0 /dev/sdd 2.09GB [5:0:0:0] disk ATA HARDDISK 1.0 /dev/sde 2.09GB
Chances are that once the virtual drives were re-attached they were recognized as mdraid devices. If md did so, then it most likely restarted the array. However, the md auto scan does not update device size changes in the metadata so we need to manually tell md the devices have grown larger.
We can check to see if this is the case like so:
# mdadm --query /dev/md127 /dev/md127: 4.00GiB raid5 5 devices, 0 spares. Use mdadm --detail for more detail.
Thus if we find the latter, we will need to stop the array and remove again.
# mdadm --stop /dev/md127 && mdadm --remove /dev/md127
Now let's re-assemble our 4 drive raid5 array and inform md that the drives have increased in capacity.
# mdadm --assemble --update=devicesize /dev/md127 mdadm: /dev/md127 has been started with 5 drives.
In the case of a raid5 array we need to have md trigger a resync to grow and claim the additional space.
# mdadm --grow --size=max /dev/md127 mdadm: component size of /dev/md127 has been set to 2046976K
Check to make sure that the additional disk space is seen and is being utilized.
# mdadm --detail /dev/md127 | grep 'Array Size' Array Size : 8187904 (7.81 GiB 8.38 GB)
Once the Resync completes on the array it's time to grow the XFS filesystem on /dev/md127 into the additional space.
# mount /dev/md127 /foo/ && xfs_growfs /foo meta-data=/dev/md127 isize=512 agcount=8, agsize=130912 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=1047296, imaxpct=25 = sunit=32 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=32 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 1047296 to 2046976
Lastly we can now check the new mounted size of the XFS filesystem.
# df -kh | grep md127 /dev/md127 7.8G 41M 7.8G 1% /foo
If you have arrived here, congratulations you have now successfully grown your raid5 mdraid array!
Whew! Who knew growing an mdraid array could be so easy - yet so complicated at the same time. Hopefully, this helped to demystify block device expansion concerns in regards to mdraid maintenance and administration. You will now no longer fear upgrading your drives when needed. No more worries about having to acquire and replace your drives with exactly the same capacity and model. Thanks for reading.
Chris Brown is a TAM in the North American region. He has expertise in operating systems, virtualization, storage and systems engineering. Prior to joining Red Hat, Chris formerly worked at GE Healthcare for 16 years where he owned and drove the Linux operating system and hardware platforms for GE Healthcare Diagnostic Imaging products. Find more posts by Chris Brown at https://www.redhat.com/en/about/blog/authors/chris-brown
A Red Hat Technical Account Manager (TAM) is a specialized product expert who works collaboratively with IT organizations to strategically plan for successful deployments and help realize optimal performance and growth. The TAM is part of Red Hat’s world class Customer Experience and Engagement organization and provides proactive advice and guidance to help you identify and address potential problems before they occur. Should a problem arise, your TAM will own the issue and engage the best resources to resolve it as quickly as possible with minimal disruption to your business.
Connect with TAMs at a Red Hat Convergence event near you! Red Hat Convergence is a free, invitation-only event offering technical users an opportunity to deepen their Red Hat product knowledge and discover new ways to apply open source technology to meet their business goals. These events travel to cities around the world to provide you with a convenient, local one-day experience to learn and connect with Red Hat experts and industry peers.