Guessing RAID sizes, rhbz#587442

While trying to sort out this bug, and learning about the storage
system, I have found, I think, at least part of the problem. I'll try to
summarize as much as I can.

To reproduce the bug:

Create 4 RAID partitions, size evenly divisible by 4M (I have been using
either 2000 or 4000). 4M is the extents size, the LVM rounds down to the
nearest one.

Create a RAID5 device with LVM on top of all 4, no spares.

Create a VG and LV using all the space on top of the RAID5.

When it tries to actually create everything it blows up with the extents

The problem is that the partition sizes that are guessed while creating
things are wrong. After formatting the actual size is queried and there
isn't enough space for the LVM to be created.

When using 2000 as the partition size it guesses that the PV is 5999 and
the VG is 5996. But after setup the PV is actually 5995.5 and the VG,
after rounding down to the nearest extent size, should be 5992 instead
of 5996.

If you put an ext4 partition on top of the RAID5 it works fine. I think
this is because it just uses whatever available space there is, it isn't
trying to create a specific size like the LV is.

So far I have tracked things back to the initial PV size report from the
RAID5 size method in MDRaidArrayDevice class. When not setup it guesses
at the size using the chunk and super block values.

It looks like the hard-coded values for chunkSize and superBlockSize are
out of date. When I look at the details for the array mdadm reports that
the chunk size is 512K, not 64K as it is currently set to.

I'm not sure what the super block size should be, but I suspect it
should be a multiple of the chunk size.

I have tried adjusting the chunk and super block sizes to 512k and 512k,
but even with those changes it doesn't account for the large difference
in guessed size and actual size for the PV (5999 - 5995.5)

So, my question is, does it sound like I'm on the right track? And does
someone who understands RAID know why the actual size is so far off?

I have added some extra size debugging to the LVM and RAID sections. If
you search this traceback dump for 'size ==' you can see the before and
after size differences:


For reference, the log patch looks like this:

diff --git a/storage/devices.py b/storage/devices.py
index 448fa44..3b277c9 100644
- --- a/storage/devices.py
+++ b/storage/devices.py
@@ -2014,6 +2014,7 @@ class LVMVolumeGroupDevice(DMDevice):
            lv.size > self.freeSpace:
             raise DeviceError("new lv is too large to fit in free
space", self.name)

+        log.debug("Adding %s/%dMB to %s" % (lv.name, lv.size, self.name))

     def _removeLogVol(self, lv):
@@ -2076,6 +2077,7 @@ class LVMVolumeGroupDevice(DMDevice):
         # sum up the sizes of the PVs and align to pesize
         size = 0
         for pv in self.pvs:
+            log.debug("PV size == %s" % pv.size)
             size += max(0, self.align(pv.size - pv.format.peStart))

         return size
@@ -2529,8 +2531,10 @@ class MDRaidArrayDevice(StorageDevice):
             elif self.level == mdraid.RAID10:
                 size = (self.memberDevices / 2.0) * smallestMemberSize
                 size -= size % self.chunkSize
+            log.debug("non-existant RAID %s size == %s" % (self.level,
             size = self.partedDevice.getSize()
+            log.debug("existing RAID %s size == %s" % (self.level, size))

         return size

Brian C. Lane <bcl redhat com>
Red Hat / Port Orchard, WA
