[dm-devel] [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted

Kiyoshi Ueda k-ueda at ct.jp.nec.com
Wed Jan 20 02:47:56 UTC 2010


Hi Yasui-san,

On 01/20/2010 05:40 AM +0900, Takahiro Yasui wrote:
> Hi,
> 
> This is a patch set to fix deadlock on suspending of mirror device.
> 
> 
> ISSUE
> =====
> 
> Suspend procedure on a dm-mirror device could cause deadlock on recovery_count
> semaphore.
> 
> When mirror_presuspend is called, recovery_count semaphore is acquired in
> dm_rh_stop_recovery() to stop recovery routine, but when an signal is caught
> in dm_wait_for_completion() or an error occurred in in dm_suspend(),
> the suspend process is interrupted without releasing recovery_count semaphore
> of a mirror device. This means that another suspend is executed, and then
> the suspend process gets stuck at dm_rh_stop_recovery().
> 
> When suspend procedure is interrupted, the device should work properly since
> the status of the device is not "suspended."
> 
> 
> SOLUTION
> ========
> 
> Introduce a target handler, cancel_presuspend, to cancel status changes
> done by a target specific presuspend handler.

How about using ->resume as a cancelling method?
Though you have to audit existing targets' ->resume handler,
I think it's better idea than adding another target handler
just for this purpose.

And in your dm-raid1 patch, cancelling log's presuspend which is used
by dm-log-userspace is missed.
So it seems that dm-raid1 can use ->resume to cancel presuspend.

Thanks,
Kiyoshi Ueda




More information about the dm-devel mailing list