[dm-devel] [PATCH 0 of 15] DM RAID: a wrapper target for MD RAID456

Jonathan Brassow jbrassow at redhat.com
Fri Dec 3 19:53:19 UTC 2010


The forthcoming patches are the second iteration (3rd when including
Neil's original drop) of the DM -> MD translation module, "dm-raid".
There have been some minor changes to some of the 9 patches I posted
last time, so I'm just including all of them along with the new patches
I have.
MD patch reversals and fixes (these can go upstream now):
	md-backout-dm-dirty-log.patch
	md-minor-updates.patch
	md-fix-null-pointer-deref.patch
dm-raid module (some reworking of Neil's original patches):
	dm-raid-seed-module.patch
	dm-target-callbacks-and-congestion-fn.patch
	dm-unplug-callback.patch
	dm-raid-iterate_devices-and-io_hints.patch
	dm-raid-suspend-and-resume-fns.patch
	dm-raid-message-fn.patch
New patches for support of separate metadata devices:
	md-new-param-to-calc_dev_sboffset.patch
	md-new-param-to_sync_page_io.patch
	md-separate-meta-and-data-devs.patch
	dm-raid-allow-metadata-devices.patch
	md-new-superblock-type.patch
	md-add-bitmap-support.patch

So far, the metadata stuff (superblock and bitmaps) is partially working.
The work I'm suffering on right now is the bitmap work.  The bitmap is
being created and updated, but I don't know why it is not being consulted
when the array is activated.  For example, if I kill a machine in the 
middle of write operations, I would expect the bitmap to show that there
is some recovery work to be done, but it does not.  So, I have yet to
figure this out.

I've included below a document that describes some of the dm-raid
design and some of the additional work that remains.  I've also attached 
a script that can be used to create RAID devices through device-mapper 
after the kernel patches have been compiled and built.

If your going to start testing, the non-persistent metadata cases should
be pretty solid.  The persistent metadata cases should work except for
full bitmap support.

 brassow


** preliminary design/descriptive doc **

The dm-raid.c code provides a why to access the functionality of MD
through device-mapper.  This allows us to create RAID4/5/6 (and possibly
MD's RAID1) through device-mapper.  Some of the difficult things to get
straight are translating device-mapper's CTR arguments into the proper
MD settings and making sure we are able to access and configure MD's
various options (recovery speed, write_back settings, etc).  The current
proposed dm-raid CTR arguments are:

	The standard first three device-mapper table arguments, where
	the target_type field is "raid"
		<start> <len> raid \

	This is followed by the parameters that specify the RAID type
	and that RAID type's required and optional arguments.
		<raid_type> <#raid_params> <raid_params> \
	The required arguments for each RAID type may be different.
	Currently, they are ('*' indicates currently unsupported RAID):
		*raid1   <#parms> <chunk_size>
		raid4    <#parms> <chunk_size> <rebuild_A>
		raid5_la <#parms> <chunk_size> <rebuild_A>
		raid5_ra <#parms> <chunk_size> <rebuild_A>
		raid5_ls <#parms> <chunk_size> <rebuild_A>
		raid5_rs <#parms> <chunk_size> <rebuild_A>
		raid6_zr <#parms> <chunk_size> <rebuild_A> <rebuild_B>
		raid6_nr <#parms> <chunk_size> <rebuild_A> <rebuild_B>
		raid6_nc <#parms> <chunk_size> <rebuild_A> <rebuild_B>
	Chunk size is in sectors and the 'rebuild' arguments are used to
	specify that a new device has been added to the array and must
	be rebuilt by parity calculations (or copying if RAID1).  The
	'rebuild' arguments are specified as an index of the array
	elements.
	**FIXME: I'd like to remove the 'rebuild' arguments as required
		 arguments and make them optional - specified as
		 'rebuild=<dev index>'.
	Optional arguments include ('*' indicates not implemented):
		[[no]sync]	Force/Prevent RAID initialization 
		*[write_back=<int>]
		*[daemon_sleep=<int>]
		*[stripecache=<int>]
		*[minspeed=<int>]
		*[maxspeed=<int>]
		**[rebuild=<idx>]	**if moved from required args

	Finally, we have the devices that compose the RAID array.  Each
	array element is given as a metadata device and data device pair.
	If there is no metadata device, a '-' is given for the metadata
	device argument.  If a device is known to have failed, a '- -'
	pair can be specified indicating that there is no data or 
	metadata device available for that position in the array.
	#raid_devs refers to the number of pairings.
		<#raid_devs> { <meta_dev1> <dev1> .. <meta_devN> <devN> }


When translating the device-mapper CTR arguments to MD settings, there
are three arguments that /must/ be set by device-mapper (dm-raid.c) at
CTR time.  They are:

* mddev->recovery_cp:
Determines the initialization state of the array.  The value determines
how far the array has processed the initial recovery.  (Initial recovery
can be parity calculation for RAID456 or copying drives for RAID1.)

* rdev->flags/In_sync:
Determines the state of an individual device.  If !In_sync, then the
device needs to be rebuilt - until then, it is not a useful member
of the array.

* rdev->recovery_offset:
Like mddev->recovery_cp, only for a single device.

Note that even if the array has not yet been initialized, the 
rdev->flags/In_sync bit is still set if the drives are healthy.  If the
array has not been initialized, you would not want to have a device that
is not 'In_sync'.  This is because no trustworthy recovery could occur
for the device because the array had not yet reached a coherent state.

>From dm-raid.c, the CTR arguments that control the above are '[no]sync'
and the rebuild parameters.  There is also a slight difference in
behavior depending on whether metadata devices are specified or not.
When there are no metadata devices specified, we won't be able to tell
if the array was shutdown cleanly, so we must assume recovery_cp = 0.
If there is metadata, we will be able to find out if the array was
shutdown cleanly, so we can set 'recovery_cp = MaxSector' and let the
settings change if the metadata requires it.

Translations when metadata devices are not specified:
					[ per device setings  ]
nosync	sync	rebuild | recovery_cp	In_sync	recovery_offset
------------------------|-------------------------------------
0	0	0	| 0		1	MaxSector
0	0	1	| 0		0	0 (INVALID)
0	1	0	| 0		1	MaxSector
0	1	1	| 0		0	0 (INVALID)
1	0	0	| MaxSector	1	MaxSector
1	0	1	| MaxSector	0	0

Translations when metadata devices are specified:
					[ per device setings  ]
nosync	sync	rebuild | recovery_cp	In_sync	recovery_offset
------------------------|-------------------------------------
0	0	0	| MaxSector	1	MaxSector
0	0	1	| MaxSector	0	0
0	1	0	| 0		1	MaxSector
0	1	1	| 0		0	0 (INVALID)
1	0	0	| MaxSector	1	MaxSector
1	0	1	| MaxSector	0	0


-------------- next part --------------
A non-text attachment was scrubbed...
Name: gime_raid.pl
Type: application/x-perl
Size: 5865 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20101203/8818bbee/attachment.pl>


More information about the dm-devel mailing list