[dm-devel] [PATCH] DM RAID: Add support for MD RAID10 personality

Brassow Jonathan jbrassow at redhat.com
Wed Jul 4 03:20:29 UTC 2012


On Jul 3, 2012, at 8:21 PM, NeilBrown wrote:

> On Tue, 26 Jun 2012 07:03:51 -0500 Jonathan Brassow <jbrassow at redhat.com>
> wrote:
> 
>> dm raid: add md raid10 support
>> 
>> Support the MD RAID10 personality through dm-raid.c
>> 
>> Signed-off-by: Jonathan Brassow <jbrassow at redhat.com>
>> 
>> Index: linux-upstream/drivers/md/dm-raid.c
>> ===================================================================
>> --- linux-upstream.orig/drivers/md/dm-raid.c
>> +++ linux-upstream/drivers/md/dm-raid.c
>> @@ -11,6 +11,7 @@
>> #include "md.h"
>> #include "raid1.h"
>> #include "raid5.h"
>> +#include "raid10.h"
>> #include "bitmap.h"
>> 
>> #include <linux/device-mapper.h>
>> @@ -52,7 +53,11 @@ struct raid_dev {
>> #define DMPF_MAX_RECOVERY_RATE 0x20
>> #define DMPF_MAX_WRITE_BEHIND  0x40
>> #define DMPF_STRIPE_CACHE      0x80
>> -#define DMPF_REGION_SIZE       0X100
>> +#define DMPF_REGION_SIZE       0x100
>> +#define DMPF_RAID10_NEAR_COPIES 0x200
>> +#define DMPF_RAID10_FAR_COPIES  0x400
>> +#define DMPF_RAID10_FAR_OFFSET  0x800
>> +
>> struct raid_set {
>> 	struct dm_target *ti;
>> 
>> @@ -66,6 +71,15 @@ struct raid_set {
>> 	struct raid_dev dev[0];
>> };
>> 
>> +/* near_copies in first byte */
>> +/* far_copies in second byte */
>> +/* far_offset in 17th bit */
>> +#define ALGORITHM_RAID10(near_copies, far_copies, far_offset) \
>> +	((near_copies & 0xFF) | ((far_copies & 0xFF) << 8) | ((!!far_offset) << 16))
>> +#define RAID10_NC(layout) (layout & 0xFF)
>> +#define RAID10_FC(layout) ((layout >> 8) & 0xFF)
>> +#define RAID10_FO(layout) (layout & 0x10000)
>> +
>> /* Supported raid types and properties. */
>> static struct raid_type {
>> 	const char *name;		/* RAID algorithm. */
>> @@ -76,6 +90,8 @@ static struct raid_type {
>> 	const unsigned algorithm;	/* RAID algorithm. */
>> } raid_types[] = {
>> 	{"raid1",    "RAID1 (mirroring)",               0, 2, 1, 0 /* NONE */},
>> +	{"raid10",   "RAID10 (striped mirrors)",        0, 2, 10, -1 /* Varies */},
>> +	{"raid1e",   "RAID1E (Enhanced RAID1)",         0, 2, 10, -1 /* Varies */},
>> 	{"raid4",    "RAID4 (dedicated parity disk)",	1, 2, 5, ALGORITHM_PARITY_0},
>> 	{"raid5_la", "RAID5 (left asymmetric)",		1, 2, 5, ALGORITHM_LEFT_ASYMMETRIC},
>> 	{"raid5_ra", "RAID5 (right asymmetric)",	1, 2, 5, ALGORITHM_RIGHT_ASYMMETRIC},
>> @@ -339,10 +355,17 @@ static int validate_region_size(struct r
>>  *    [max_write_behind <sectors>]	See '-write-behind=' (man mdadm)
>>  *    [stripe_cache <sectors>]		Stripe cache size for higher RAIDs
>>  *    [region_size <sectors>]           Defines granularity of bitmap
>> + *
>> + * RAID10-only options:
>> + *    [raid10_near_copies   <# copies>] Near copies. (Default: 2)
>> + *    [raid10_far_copies    <# copies>] Far copies.  (Default: 1)
>> + *    [raid10_far_offset    <0/1>]      Offset is device size(0) or stripe(1).
> 
> Can I suggest that you don't do it like this?  i.e. don't copy the
> mistakes I made :-)
> 
> I don't think there is any value in supporting multiple near and far copies.
> There are two dimensions of the layout:
> - number of copies.  Defaults to 2
> - location of the copies:  near, far, offset
> 
> Some day I could implement an alternate version of 'far' or 'offset' which
> improves redundancy slightly.
> Instead of 
>   A B C D
>   ...
>   D A B C
> 
> it would be
> 
>   A B C D 
>   ....
>   B A D C
> 
> i.e. treat the devices as pair and swap the device for the second copy.
> This doesn't generalise to an odd number of devices, but increases the number
> of pairs of devices that can concurrently fail without losing date.
> (for 4 devices, there are 6 pairs.  With current 'far' mode there are only 2
> pair of devices that can concurrently fail (0,2 and 1,3).  With the proposed
> far mode there are 4 (0,2 0,3 1,2, 1,3).
> 
> Adding this with your current proposal would be messy.
> Adding it with the two dimensions I suggest would simply involve adding
> another 'location' - 'farswap' or 'far2' or something.
> 
> I note you didn't make 'dm-raid1e' a module alias.  Was that deliberate?

The missed module alias was an oversight, thanks for catching that.  (Similar - as you can see from this patch - to the oversight I had when introducing raid1. :)  I have gone back and forth on whether to include the alternate "raid1e" or not.  There are different RAID1E algorithms, as described in the kernel doc.  Perhaps there should also be aliases similar to raid5 and raid6 - like raid1e_as (i.e. RAID1E - Adjacent Stripe), etc.  However, there are several combinations that don't have a real name.  Any thoughts?  I would be just fine leaving "raid1e" out as I would leaving it in.  These days, RAID10 is really a subset of RAID1E - perhaps I should be considering taking "raid10" out.  ;)

I like your suggestion of changing the parameter names.  I've found the original names somewhat confusing.  ('far_offset' seems to imply to me that the copy would not be the very next stripe, but _offset_ somehow - it seems to have the reverse meaning to me.  I think this comes from the fact that it acts as a modifier to 'far_copy'.)  I toyed with a couple different ways of doing this but figured it was best to just go along.  Anyway, what you are suggesting seems to be:
	raid10_copies <number> (Default: 2)
	raid10_layout <string> (Default: "near"/"adjacent")
Where <string> could be "near", "far", "offset" and "some-future-thing".  That seems nice to me and seems to clear up some of the confusion caused by "far_offset" seeming to be a modifier to "far_copies".

Let me know what you think about the above, and I'll get v2 ready.

 brassow

P.S.  While it doesn't affect this singular patch, I see people posting their set of patches as a reply to the summary '0 of' patch.  This keeps things together better, and I'll assume this is the way I should post from now on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20120703/92060a79/attachment.htm>


More information about the dm-devel mailing list