[dm-devel] [RFC][PATCH 0/3] dm-raid1: fix deadlock at suspend after suspend was interrupted

Takahiro Yasui tyasui at redhat.com
Wed Jan 20 22:58:04 UTC 2010


Hi Ueda-san,

Kiyoshi Ueda wrote:
> On 01/20/2010 05:40 AM +0900, Takahiro Yasui wrote:
>> Hi,
>>
>> This is a patch set to fix deadlock on suspending of mirror device.
>>
>>
>> ISSUE
>> =====
>>
>> Suspend procedure on a dm-mirror device could cause deadlock on recovery_count
>> semaphore.
>>
>> When mirror_presuspend is called, recovery_count semaphore is acquired in
>> dm_rh_stop_recovery() to stop recovery routine, but when an signal is caught
>> in dm_wait_for_completion() or an error occurred in in dm_suspend(),
>> the suspend process is interrupted without releasing recovery_count semaphore
>> of a mirror device. This means that another suspend is executed, and then
>> the suspend process gets stuck at dm_rh_stop_recovery().
>>
>> When suspend procedure is interrupted, the device should work properly since
>> the status of the device is not "suspended."
>>
>>
>> SOLUTION
>> ========
>>
>> Introduce a target handler, cancel_presuspend, to cancel status changes
>> done by a target specific presuspend handler.
> 
> How about using ->resume as a cancelling method?
> Though you have to audit existing targets' ->resume handler,
> I think it's better idea than adding another target handler
> just for this purpose.

A resume method contains a whole resume procedure, but when suspend is
interrupted, postsuspend handler is not processed. So the requirements
are to restore state changes done by presuspend handler. If a whole
resume procedure is executed, at least, dm-log will have a problem.

mirror log is flushed in postsuspend handler and log disk might contain
stale data at the moment when suspend is interrupted. If resume handler
is used instead of cancel_presuspend handler, log data on memory will be
overwritten by stale data on disk.

I'm afraid that we need to modify each target's resume handler so that
they work properly even after processing presuspend handler but before
postsuspend handler.

Please let me know if there is some oversight.

> And in your dm-raid1 patch, cancelling log's presuspend which is used
> by dm-log-userspace is missed.

Thank you for telling this. Yes, userspace target should be also handled.
I will fix it.

Thanks,
Taka




More information about the dm-devel mailing list