Expanding the capacity of a Red Hat Enterprise Linux MD raid part 2.

25 de agosto de 2017Chris Brown12 minutos (tempo de leitura)

Continuing with our multi-part topic, we again find ourselves deep in the dark realm of storage subsystems. Once again, we will be venturing below the application layer and into the filesystem where all your data resides. This time, we’ll again be digging into the land of Red Hat Enterprise Linux software raid (mdraid) and looking at Raid10 expansion with an XFS filesystem.

If you did not catch our prior foray into these areas, check out Expanding the capacity of a Red Hat Enterprise Linux MD raid part 1.

Inevitably, at some point during the course of managing and/or re-configuring backing storage devices, capacities can change. Generally speaking, in most cases, the capacity increases. This can happen for a variety of reasons:

An HW raid volume drive upgrade/replacement with larger drive(s)
- Example: Online capacity expansion of a LSI Megaraid Raid1 volume
Expansion of a remote ISCSI, AOE or FIBRE Channel block device
An underlying block or file backed storage device attached to a Virtual Machine has been expanded
The contents of a device which is part of an existing mdraid device has been cloned to, and replaced with a larger device
Some other scenario not listed here where a block storage device has been grown in size

Any of the latter block storage devices can be used with Red Hat Enterprise Linux software raid (mdraid). Generally, the configuration of a mdraid device is commonly regarded as somewhat static in nature. Most of the time mdraid devices are configured from underlying storage devices and are sent on their way to become a physical member (PV) of a Logical Volume Manager group (VG). Device expansion, specifically with the likes of LVM, is relatively easy. However, with mdraid, capacity expansion is not generally a topic covered very well, nor is it especially well documented. That said, with a bit of black magic, nerves of steel and a steady hand it is possible. So let's go ahead and dig in using a virtual machine for today’s example.

Before we begin, here are a few disclaimers:

Expanding a mdraid will always involve the filesystem/mdraid device being taken offline.
As with any potentially risky change to a storage subsystem, ensure you have appropriate backups of your data!

Today’s scenario finds us wanting to expand a md raid10 array formatted with an XFS filesystem.

Grow a MD Raid10 array consisting of four devices with an xfs filesystem.

First we need to find out about the block devices present in the system:

# lsscsi --size
[0:0:0:0]    cd/dvd  ATA      CD-ROM      1.0   /dev/sr0        -
[1:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sda   8.58GB
[2:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdb   1.07GB
[3:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdc   1.07GB
[4:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdd   1.07GB
[5:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sde   1.07GB

It looks like /dev/sdb, /dev/sdc, /dev/sdd, and /dev/sde would be good candidates to use, but let's check first and make sure they are not mounted and being used.

# mount | grep sd[abcde]
/dev/sda1 on / type ext4 (rw,relatime,seclabel,data=ordered)

After checking we see that /dev/sdb, /dev/sdc, /dev/sdd, and /dev/sde are free and clear to use, so we can go ahead and create our raid10 array with them.

# mdadm --create /dev/md127 --name=foo --level=raid10 --raid-devices=4 --chunk=128K /dev/sdb /dev/sdc /dev/sdd /dev/sde
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md127 started.

Now that we have created a new array, we need to create or check for the presence of /etc/mdadm.conf, if not already existing. By default, mdadm does not write out a configuration file (/etc/mdadm.conf). It instead relies upon scanning /proc/partitions, device superblock/bitmaps data and /proc/mdstat to assemble any arrays which are found.

First we need to check to see if /etc/mdadm.conf already exists:

# ls -l /etc/mdadm.conf

If the file does not exist, we need to create it and capture the mdraid configuration into /etc/mdadm.conf and verify its contents. (*NOTE* this command will overwrite an existing conf)

# mdadm --detail --scan /dev/md127 > /etc/mdadm.conf && cat /etc/mdadm.conf 
ARRAY /dev/md127 metadata=1.2 name=fclive:foo UUID=726c8d47:f0fb76f5:f8817049:c1372f6c

If the file already exists, we need to append to the mdraid configuration for our newly created array into /etc/mdadm.conf and verify its contents.

# mdadm --detail --scan /dev/md127 >> /etc/mdadm.conf && cat /etc/mdadm.conf
ARRAY /dev/md0 metadata=1.2 name=fclive:sys UUID=17771f82:63a78d76:a7badcd1:bf133289
ARRAY /dev/md127 metadata=1.2 name=fclive:foo UUID=726c8d47:f0fb76f5:f8817049:c1372f6c

Now that we have our config file created, we can check how much disk space is seen and is being utilized by the array.

# mdadm --detail /dev/md127 | grep 'Array Size'
    Array Size : 2095104 (2046.00 MiB 2145.39 MB)

Now we can go ahead and create an optimally aligned Redhat XFS filesystem on the mdraid device /dev/md127.

Remarks on XFS alignment with mdraid or hardware raid devices:

XFS 'su' is used to specify the stripe unit for a RAID device or a striped logical volume (RAID stripe size)
XFS 'sw' is used to specify the stripe width and is a multiplier of the stripe unit, usually the same as the number of stripe members in the logical volume configuration, or data disks in a RAID device.

Thus given the raid level, array geometry and device count in our example we end up with:

Array chunk size = 128K so XFS 'su=128k'
Number of drives in the array is 4 in Raid 10 so XFS 'sw=2' (stripe of mirrors so numdrives /2)

We can now go ahead and create the filesystem on /dev/md127 using the optimal “su” and “sw” values that we determined.

# mkfs.xfs -L foo -d su=128k,sw=2 /dev/md127
meta-data=/dev/md127             isize=512    agcount=8, agsize=65440 blks
        =                       sectsz=512   attr=2, projid32bit=1
        =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data    =                       bsize=4096   blocks=523520, imaxpct=25
        =                       sunit=32     swidth=64 blks
naming  =version 2              bsize=4096   ascii-ci=0 ftype=1
log     =internal log           bsize=4096   blocks=2560, version=2
        =                       sectsz=512   sunit=32 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Now we want to mount the XFS file system at /foo and check out the size.

# mkdir /foo
# mount /dev/md127 /foo && df -kh | grep md127
/dev/md127      2.0G   35M  2.0G   2% /foo

Everything looks good so let's go ahead and unmount the filesystem.

# umount /foo

We are now ready to grow the underlying storage so we need to stop and remove the mdraid device /dev/md127.

# mdadm --stop /dev/md127 && mdadm --remove /dev/md127

Now that the array is offline, we can expand the associated virtual block devices. In almost all cases this operation occurs offline and outside the control of the guest operating system.

Our example VM is utilizing the qemu/libvirt virtualized SATA controller, so we can simply "hot-plug" the virtual SATA block devices to grow them.

Referring back to the lsscsi output from earlier, we see that /dev/sdb, /dev/sdc, dev/sdd, /dev/sde which made up /dev/md127 are attached to SCSI_HOST devices 2:0:0:0, 3:0:0:0, 4:0:0:0 and 5:0:0:0

[root@fclive ~]# lsscsi --size
[0:0:0:0]    cd/dvd  ATA      CD-ROM      1.0   /dev/sr0        -
[1:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sda   8.58GB
[2:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdb   1.07GB
[3:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdc   1.07GB
[4:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdd   1.07GB
[5:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sde   1.07GB

At this point, we need to “hot-plug” our virtual SATA block devices so we can grow them.

First we need to inform the kernel to delete the SATA devices at SCSI_HOST address 2:0:0:0, 3:0:0:0, 4:0:0:0 and 5:0:0:0.

# echo 1 > /sys/class/scsi_device/2\:0\:0\:0/device/delete 
# echo 1 > /sys/class/scsi_device/3\:0\:0\:0/device/delete
# echo 1 > /sys/class/scsi_device/4\:0\:0\:0/device/delete 
# echo 1 > /sys/class/scsi_device/5\:0\:0\:0/device/delete

Check to make sure the SATA devices that were at SCSI_HOST address 2:0:0:0, 3:0:0:0, 4:0:0:0 and 5:0:0:0 are gone.

# lsscsi --size
[0:0:0:0]    cd/dvd  ATA      CD-ROM      1.0   /dev/sr0        -
[1:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sda   8.58GB

You can proceed to grow the block devices. I won't cover all the possible methods and technologies here (refer back to the brief list of examples above).

With the wave of a hand magically the devices have now grown larger...

The virtual block devices have now been expanded in capacity and it is time to bring them back online. After hot-plugging the devices back on to the VM, the guest OS should pick up on the change in sizes and geometry. If the guest OS does not pick up on the latter we will need to force a rescan.

If you need to force a rescan, one of two methods can be employed:

Method 1 (This method will scan for for new devices):

# echo "- - -" > /sys/class/scsi_host/host2/scan
# echo "- - -" > /sys/class/scsi_host/host3/scan
# echo "- - -" > /sys/class/scsi_host/host4/scan
# echo "- - -" > /sys/class/scsi_host/host5/scan

Method 2 (This method will scan for existing and/or missing devices):

# echo 1 > /sys/class/scsi_device/2\:0\:0\:0/device/rescan
# echo 1 > /sys/class/scsi_device/3\:0\:0\:0/device/rescan
# echo 1 > /sys/class/scsi_device/4\:0\:0\:0/device/rescan
# echo 1 > /sys/class/scsi_device/5\:0\:0\:0/device/rescan

We should now see our virtual SATA block devices re-attached as /dev/sdb, /dev/sdc, dev/sdd and /dev/sde except that they have now each grown another 1GB in size.

# lsscsi --size
[0:0:0:0]    cd/dvd  ATA      CD-ROM      1.0   /dev/sr0        -
[1:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sda   8.58GB
[2:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdb   2.14GB
[3:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdc   2.14GB
[4:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sdd   2.14GB
[5:0:0:0]    disk    ATA      HARDDISK    1.0   /dev/sde   2.14GB

Chances are that once the virtual drives were re-attached they were recognized as mdraid devices. If md did so, then it most likely restarted the array. However, the md auto scan does not update device size changes in the metadata so we need to manually tell md the devices have grown larger.

We can check to see if this is the case like so:

# mdadm --query /dev/md127
/dev/md127: 2046.00MiB raid10 4 devices, 0 spares. Use mdadm --detail for more detail.

Thus if we find the latter, we will need to stop the array and remove again.

# mdadm --stop /dev/md127 && mdadm --remove /dev/md127

Now let's re-assemble our 4 drive raid10 array and inform md that the drives have increased in capacity.

# mdadm --assemble --update=devicesize /dev/md127
mdadm: /dev/md127 has been started with 4 drives.

In the case of a raid10 array, we need to have md trigger a resync to grow and claim the additional space.

# mdadm --grow --size=max /dev/md127
mdadm: component size of /dev/md127 has been set to 2096128K

Check to make sure that the additional disk space is seen and is being utilized.

# mdadm --detail /dev/md127 | grep 'Array Size'
    Array Size : 4192256 (4.00 GiB 4.29 GB)

Once the Resync completes on the array it's time to grow the XFS filesystem on /dev/md127 into the additional space.

# mount /dev/md127 /foo/ && xfs_growfs /foo
meta-data=/dev/md127            isize=512    agcount=8, agsize=65440 blks
        =                       sectsz=512   attr=2, projid32bit=1
        =                       crc=1        finobt=1 spinodes=0 rmapbt=0
        =                       reflink=0
data    =                       bsize=4096   blocks=523520, imaxpct=25
        =                       sunit=32     swidth=64 blks
naming  =version 2              bsize=4096   ascii-ci=0 ftype=1
log     =internal               bsize=4096   blocks=2560, version=2
        =                       sectsz=512   sunit=32 blks, lazy-count=1
realtime =none                  extsz=4096   blocks=0, rtextents=0
data blocks changed from 523520 to 1048064

Lastly we can now check the new mounted size of the XFS filesystem.

# df -kh | grep md127
/dev/md127      4.0G   38M  4.0G   1% /foo

If you have arrived here, congratulations you have now successfully grown your raid10 mdraid array!

Closing thoughts

Whew! Who knew growing an mdraid array could be so easy - yet so complicated at the same time. Hopefully, this helped to demystify block device expansion concerns in regards to mdraid maintenance and administration. You will now no longer fear upgrading your drives when needed. No more worries about having to acquire and replace your drives with exactly the same capacity and model. Thanks for reading and stay tuned for part three where we will look at expanding an md raid 5 array with an XFS file-system.

Chris Brown is a TAM in the North American region. He has expertise in operating systems, virtualization, storage and systems engineering. Prior to joining Red Hat, Chris formerly worked at GE Healthcare for 16 years where he owned and drove the Linux operating system and hardware platforms for GE Healthcare Diagnostic Imaging products. Find more posts by Chris Brown at https://www.redhat.com/en/about/blog/authors/chris-brown

A Red Hat Technical Account Manager (TAM) is a specialized product expert who works collaboratively with IT organizations to strategically plan for successful deployments and help realize optimal performance and growth. The TAM is part of Red Hat’s world class Customer Experience and Engagement organization and provides proactive advice and guidance to help you identify and address potential problems before they occur. Should a problem arise, your TAM will own the issue and engage the best resources to resolve it as quickly as possible with minimal disruption to your business.

Connect with TAMs at a Red Hat Convergence event near you! Red Hat Convergence is a free, invitation-only event offering technical users an opportunity to deepen their Red Hat product knowledge and discover new ways to apply open source technology to meet their business goals. These events travel to cities around the world to provide you with a convenient, local one-day experience to learn and connect with Red Hat experts and industry peers.