[dm-devel] [PATCH 1/1] MD/DM RAID: Fix hang due to recent RAID5 locking changes

NeilBrown neilb at suse.de
Mon Nov 25 00:03:15 UTC 2013


On Sun, 24 Nov 2013 17:30:43 -0600 Jonathan Brassow <jbrassow at redhat.com>
wrote:

> When commit 773ca82 was made in v3.12-rc1, it caused RAID4/5/6 devices
> that were created via device-mapper (dm-raid.c) to hang on creation.
> This is not necessarily the fault of that commit, but perhaps the way
> dm-raid.c was setting-up and activating devices.
> 
> Device-mapper allows I/O and memory allocations in the constructor
> (i.e. raid_ctr()), but nominal and recovery I/O should not be allowed
> until a 'resume' is issued (i.e. raid_resume()).  It has been problematic
> (at least in the past) to call mddev_resume before mddev_suspend was
> called, but this is how DM behaves - CTR then resume.  To solve the
> problem, raid_ctr() was setting up the structures, calling md_run(), and
> then also calling mddev_suspend().  The stage was then set for raid_resume()
> to call mddev_resume().
> 
> Commit 773ca82 caused a change in behavior during raid5.c:run().
> 'setup_conf->grow_stripes->grow_one_stripe' is called which creates the
> stripe cache and increments 'active_stripes'.
> 'grow_one_stripe->release_stripe' doesn't actually decrement 'active_stripes'
> anymore.  The side effect of this is that when raid_ctr calls mddev_suspend,
> it waits for 'active_stripes' to reduce to 0 - which never happens.

Hi Jon,
 this sounds like the same bug that is fixed by 

commit ad4068de49862b083ac2a15bc50689bb30ce3e44
Author: majianpeng <majianpeng at gmail.com>
Date:   Thu Nov 14 15:16:15 2013 +1100

    raid5: Use slow_path to release stripe when mddev->thread is null

which is already en-route to 3.12.x.  Could you check if it fixes the bug for
you?

Thanks,
NeilBrown

> 
> You could argue that the MD personalities should be able to handle either
> a suspend or a resume after 'md_run' is called, but it can't really handle
> either.  To fix this, I've removed the call to mddev_suspend in raid_ctr and
> I've made the call to the personality's 'quiesce' function within
> mddev_resume dependent on whether the device is currently suspended.
> 
> This patch is suitable and recommended for 3.12.
> 
> Signed-off-by: Jonathan Brassow <jbrassow at redhat.com>
> ---
>  drivers/md/dm-raid.c |    1 -
>  drivers/md/md.c      |    5 ++++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index 4880b69..cdad87c 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -1249,7 +1249,6 @@ static int raid_ctr(struct dm_target *ti, unsigned argc, char **argv)
>  	rs->callbacks.congested_fn = raid_is_congested;
>  	dm_table_add_target_callbacks(ti->table, &rs->callbacks);
>  
> -	mddev_suspend(&rs->md);
>  	return 0;
>  
>  size_mismatch:
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 561a65f..383980d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -359,9 +359,12 @@ EXPORT_SYMBOL_GPL(mddev_suspend);
>  
>  void mddev_resume(struct mddev *mddev)
>  {
> +	int should_quiesce = mddev->suspended;
> +
>  	mddev->suspended = 0;
>  	wake_up(&mddev->sb_wait);
> -	mddev->pers->quiesce(mddev, 0);
> +	if (should_quiesce)
> +		mddev->pers->quiesce(mddev, 0);
>  
>  	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>  	md_wakeup_thread(mddev->thread);

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20131125/1d3f1159/attachment.sig>


More information about the dm-devel mailing list