[dm-devel] [PATCH] a deadlock bug in the kernel-side device mapper code
Mike Anderson
andmike at linux.vnet.ibm.com
Mon Nov 9 08:51:42 UTC 2009
Mikulas Patocka <mpatocka at redhat.com> wrote:
> Hi
>
> This is the patch that uses two locks to avoid the deadlock.
Thanks for doing the patch.
I had previously started trying to address this issue using rcu and moving
dm_copy_name_and_uuid back to being called during dm_build_path_uevent, but
that patch still had a couple of cases to be addressed.
In testing your patch without moving where dm_copy_name_and_uuid is called
I run into a issue during test runs where I receive a BUG_ON for the
dm_put in dm_copy_name_and_uuid as DMF_FREEING was able to progress (Note:
this failure case occurs without your path). If the proper dm_get / dm_put
is added to the dm_uevent functions then there are cases where
dm_uevent_free becomes the last dm_put resulting in recursion.
It would be good since we are adding this synchronization if we selected a
synchronization type that could be called from dm_build_path_uevent (i.e.,
SOFTIRQ-safe) allowing the movement of the call to dm_copy_name_and_uuid
back to dm_build_path_uevent.
The test case below normally fails in about 5-10 minutes.
I am running the test case using a spinlock instead of the mutex and
moving dm_copy_name_and_uuid to being called from dm_build_path_uevent. It
has been running for a few hours now. I will continue to let it run.
Should we look to use a spinlock for this read access?
My test case just uses scsi debug to create a two path dm mpath device.
1.) modprobe scsi_debug vpd_use_hostno=0 add_host=2
2.) Then in one shell do a loop of "dmsetup remove" and multipath
3.) In another window do a loop of "dmsetup message ... fail_path"
followed by "dmsetup message ... reinstate_path" on the two paths of the
same dm device that is being removed / added.
Note: If someone tries to repeat this testing, occasionally I would hit an
issue in scsi_debug so for longer test runs I needed to add a patch for
handling ensuring that reacquiring queued_arr_lock did not occur.
Thanks,
-andmike
--
Michael Anderson
andmike at linux.vnet.ibm.com
More information about the dm-devel
mailing list