[dm-devel] [PATCH] a deadlock bug in the kernel-side device mapper code

Mike Anderson andmike at linux.vnet.ibm.com
Mon Nov 9 08:51:42 UTC 2009


Mikulas Patocka <mpatocka at redhat.com> wrote:
> Hi
> 
> This is the patch that uses two locks to avoid the deadlock.

Thanks for doing the patch. 

I had previously started trying to address this issue using rcu and moving
dm_copy_name_and_uuid back to being called during dm_build_path_uevent, but
that patch still had a couple of cases to be addressed.

In testing your patch without moving where dm_copy_name_and_uuid is called
I run into a issue during test runs where I receive a BUG_ON for the
dm_put in dm_copy_name_and_uuid as DMF_FREEING was able to progress (Note:
this failure case occurs without your path). If the proper dm_get / dm_put
is added to the dm_uevent functions then there are cases where
dm_uevent_free becomes the last dm_put resulting in recursion.

It would be good since we are adding this synchronization if we selected a
synchronization type that could be called from dm_build_path_uevent (i.e.,
SOFTIRQ-safe) allowing the movement of the call to dm_copy_name_and_uuid
back to dm_build_path_uevent.

The test case below normally fails in about 5-10 minutes.

I am running the test case using a spinlock instead of the mutex and
moving dm_copy_name_and_uuid to being called from dm_build_path_uevent. It
has been running for a few hours now. I will continue to let it run.

Should we look to use a spinlock for this read access?

My test case just uses scsi debug to create a two path dm mpath device.

1.) modprobe scsi_debug vpd_use_hostno=0 add_host=2
2.) Then in one shell do a loop of "dmsetup remove" and multipath
3.) In another window do a loop of "dmsetup message ... fail_path"
followed by "dmsetup message ... reinstate_path" on the two paths of the
same dm device that is being removed / added.

Note: If someone tries to repeat this testing, occasionally I would hit an
issue in scsi_debug so for longer test runs I needed to add a patch for
handling ensuring that reacquiring queued_arr_lock did not occur.

Thanks,

-andmike
--
Michael Anderson
andmike at linux.vnet.ibm.com




More information about the dm-devel mailing list