[dm-devel] [PATCH 7/7] Hold all write bios when errors are handled

Wed Nov 25 22:47:36 UTC 2009

On 11/25/09 15:23, malahal at us.ibm.com wrote:
> Mikulas Patocka [mpatocka at redhat.com] wrote:
>>
>>>> Imagine this scenario:
>>>> * secondary leg fails
>>>> * write fails on the secondaty leg and succeeds on the primary leg 
>>>> and is successfully complete
>>>> * the computer crashes
>>>> * after a reboot, the primary leg is inaccessible and the secondary leg is 
>>>> back online --- now raid1 would be returning stale data.
>>>
>>> The software can detect this case. We can fail this completely or use
>>> the data from the secondary that could be "stale" with help from admin. 
>>> Let us call this method 1.
>>
>> You can't detect it because the computer crashed *before* you write the 
>> information that the secondary leg failed to the metadata.
>>
>> So, after a reboot, you can't tell if any mirror leg failed some requests 
>> before the crash.
> 
> My definition of 'primary' is the first leg. Now on, I will use "first
> leg" to avoid confusion.  On a reboot, LVM can find if its first leg is
> missing. If it is missing, it can ask the admin whether to use the
> 'second' leg or not. When I said, "software" can detect, I really meant
> that LVM can detect that the "first leg" is missing.

I think again the scenario which Mikulas pointed. It looks double failures
(fails happened on two legs), and human intervention would be acceptable.
However, how do we know if the second leg contains valid data?

There might be two cases.

  1) System crashed during write operations without any disk failures, and
     the first leg fails at the next boot.

     We can use the secondary leg because data in the secondary leg is valid.

  2) System crashed after the secondary leg failed, and the first leg fails
     and the secondary leg gets back online at the next boot.

     We can't use the secondary leg because data might be stale.

I haven't checked the contents of log disk, but I guess we can't differentiate
these cases from log disks. Another possibility I thought was error messages.
If any error messages for the secondary leg are recorded, we can judge that
the secondary leg contains stale data, but I suspect that it is not a secure
way because syslog might not be written in disk before system crash.

I would like to enhance system availability by keep system running when
the secondary leg fails, but we need to confirm this case.

I appreciate your comments.

Thanks,
Taka