[dm-devel] [PATCH 1/1] MD/DM RAID: Fix hang due to recent RAID5 locking changes

Tue Nov 26 14:32:36 UTC 2013

On Nov 25, 2013, at 11:27 PM, NeilBrown wrote:

> On Mon, 25 Nov 2013 13:08:56 -0600 Brassow Jonathan <jbrassow at redhat.com>
> wrote:
> 
>> 
>> On Nov 25, 2013, at 8:20 AM, Brassow Jonathan wrote:
>> 
>>> 
>>> On Nov 24, 2013, at 6:03 PM, NeilBrown wrote:
>>> 
>>>> On Sun, 24 Nov 2013 17:30:43 -0600 Jonathan Brassow <jbrassow at redhat.com>
>>>> wrote:
>>>> 
>>>>> When commit 773ca82 was made in v3.12-rc1, it caused RAID4/5/6 devices
>>>>> that were created via device-mapper (dm-raid.c) to hang on creation.
>>>>> This is not necessarily the fault of that commit, but perhaps the way
>>>>> dm-raid.c was setting-up and activating devices.
>>>>> 
>>>>> Device-mapper allows I/O and memory allocations in the constructor
>>>>> (i.e. raid_ctr()), but nominal and recovery I/O should not be allowed
>>>>> until a 'resume' is issued (i.e. raid_resume()).  It has been problematic
>>>>> (at least in the past) to call mddev_resume before mddev_suspend was
>>>>> called, but this is how DM behaves - CTR then resume.  To solve the
>>>>> problem, raid_ctr() was setting up the structures, calling md_run(), and
>>>>> then also calling mddev_suspend().  The stage was then set for raid_resume()
>>>>> to call mddev_resume().
>>>>> 
>>>>> Commit 773ca82 caused a change in behavior during raid5.c:run().
>>>>> 'setup_conf->grow_stripes->grow_one_stripe' is called which creates the
>>>>> stripe cache and increments 'active_stripes'.
>>>>> 'grow_one_stripe->release_stripe' doesn't actually decrement 'active_stripes'
>>>>> anymore.  The side effect of this is that when raid_ctr calls mddev_suspend,
>>>>> it waits for 'active_stripes' to reduce to 0 - which never happens.
>>>> 
>>>> Hi Jon,
>>>> this sounds like the same bug that is fixed by 
>>>> 
>>>> commit ad4068de49862b083ac2a15bc50689bb30ce3e44
>>>> Author: majianpeng <majianpeng at gmail.com>
>>>> Date:   Thu Nov 14 15:16:15 2013 +1100
>>>> 
>>>>  raid5: Use slow_path to release stripe when mddev->thread is null
>>>> 
>>>> which is already en-route to 3.12.x.  Could you check if it fixes the bug for
>>>> you?
>>> 
>>> Sure, I'll check.  Just reading the subject of the patch, I have high hopes.  The slow path decrements 'active_stripes', which was causing the above problem...  I'll make sure though.
>> 
>> Yes, this patch fixes the issue in 3.12-rc1+.
>> 
>> However, there is still a problem I'm searching for that was introduced in commit 566c09c (at least that's what I get when bisecting).
>> 
>> The problem only shows up when I have taken a snapshot of a RAID5 device and only if I have cycled the device before adding the snapshot:
>> 1> lvcreate --type raid5 -i 3 -L 20M -n lv vg
>> 2> lvchange -an vg/lv
>> 3> lvchange -ay vg/lv
>> 4> lvcreate -s vg/lv -L 50M -n snap
>> 5> lvchange -an vg/lv
>> 6> lvchange -ay vg/lv -- BUG: line 292 of raid5.c
>> 
>> The current bug triggers on the 'BUG_ON(atomic_read(&conf->active_stripes)==0)' in do_release_stripe().  I'm not sure why yet.
>> 
>> brassow
> 
> I've had a look and I must say I'm not sure either.
> I keep wondering if something is wrong with the locking in get_active_stripe.
> The region covered by device_lock is not much smaller with the whole now
> covered by hash_locks[hash].  I cannot see a problem with the locking but I
> might be missing something.  A missing atomic_inc of active_stripes in there
> could cause your problem.
> 
> As you can easily reproduce, could you try expanding the range covered by
> device_lock to be the whole branch where sh is not NULL.  If that makes a
> difference it would be quite instructive.  I don't hold high hopes though.

Sure, I'll try that.

I've also found that I hit the BUG() on line 693 of raid5.c:get_active_stripe().

thanks,
 brassow