Proper alignment between disk HW blocks, mdadm strides, and ext[23] blocks

Justin Piszcz jpiszcz at lucidpixels.com
Thu Nov 15 13:42:49 UTC 2007



On Fri, 9 Nov 2007, Andreas Dilger wrote:

> On Nov 09, 2007  19:11 -0700, Chris Worley wrote:
>> How do you measure/gauge/assure proper alignment?
>>
>> The physical disk has a block structure.  What is it or how do you
>> find it?  I'm guessing it's best to not partition disks in order to
>> assure that whatever it's block read/write is isn't bisected by the
>> partition.
>
> For Lustre we never partition the disks for exactly this reason, and if
> you are using LVM/md on the whole device it doesn't make sense either.
>
>> Then, mdadm has some block structure.  The "-c" ("chunk") is in
>> "kibibytes" (feed the dog kibbles?), with a default of 64.  Not a clue
>> what they're trying to do.
>
> That just means for RAID 0/5/6 that the amount of data or parity in a
> stripe is a multipe of the chunk size, i.e. for a 4+1 RAID5 you get:
>
> 	disk0 disk1 disk2 disk3 disk4
> 	[64kB][64kB][64kB][64kB][64kB]
> 	[64kB][64kB]...
>
>> Finally, mkfs.ext[23] has a "stride", which is defined as a "stripe
>> size" in the man page (and I thought all your stripes added together
>> are a "stride"), as well as a block size.
>
> For ext2/3/4 the stride size (in kB) == the mdadm chunk size.  Note that
> the ext2/3/4 stride size is in units of filesystem blocks, so if you have
> 4kB filesystem blocks (default for filesystems > 500MB) and a 64kB RAID5
> chunk size, this is 16:
>
> 	e2fsck -E stride=16 /dev/md0
>
>> It's important to make sure these all align properly, but their definitions
>> do.
>
> ... do not?
>
>> Could somebody please clarify... with an example?
>
> Yes, I constantly wish the terminology were constant between different tools,
> but sadly there isn't any "proper" terminology out there as far as I've been
> able to see.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Software Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>

Quick question Andreas, if you do not provide a -E stride=16 on a RAID5 
filesystem, how much worse does the performance become on say a 2.0 or 
5.0TB ext3 filesystem?

Justin.




More information about the Ext3-users mailing list