[dm-devel] [PATCH 7/7] Hold all write bios when errors are handled

Wed Nov 25 13:19:19 UTC 2009

On Tue, 24 Nov 2009, malahal at us.ibm.com wrote:

> I need to look at the code again, but I thought any new writes to a
> failed region go to a surviving leg. In that case, we end up returning
> I/O's to the application after writing to a single leg.

Writes always go to all the legs, see do_write(). Anyway, dmeventd removes 
the failed leg soon.

> > > Also, we do need to do the above work only if "primary" leg fails. We
> > > can continue to work just like the old code if "secondary" legs fail,
> > > right? Not sure if this is worth optimizing though, but I would like to
> > > see it implemented as it is just a few extra checks. We can have
> > > primary_failure field like log_failure field.
>  
> > I thought about it too, but concluded that we need to hold bios even if 
> > the primary leg fails.
> > 
> > Imagine this scenario:
> > * secondary leg fails
> > * write fails on the secondaty leg and succeeds on the primary leg 
> > and is successfully complete
> > * the computer crashes
> > * after a reboot, the primary leg is inaccessible and the secondary leg is 
> > back online --- now raid1 would be returning stale data.
> 
> The software can detect this case. We can fail this completely or use
> the data from the secondary that could be "stale" with help from admin. 
> Let us call this method 1.

You can't detect it because the computer crashed *before* you write the 
information that the secondary leg failed to the metadata.

So, after a reboot, you can't tell if any mirror leg failed some requests 
before the crash.

> > If we hold the bios if the secondary leg fails (as the patch does), one of 
> > these two scenarios happen:
> > 
> > * secondary leg fails
> > * write succeeds on the primary leg and is held
> > * the computer crashes
> > * after a reboot, the primary leg is inaccessible and the secondary leg is
> > back online --- but we haven't completed the write, so the transaction 
> > wasn't reported as committed
> > 
> > or
> > 
> > * secondary leg fails
> > * write succeeds on the primary leg and is held
> > * dmeventd removes the secondary leg and the write succeeds
> > * the computer crashes
> > * after a reboot, the primary leg is inaccessible, the secondary leg was 
> > already removed by dmeventd, so the array is considered inaccessible. So 
> > it doesn't work but at least it doesn't revert already committed 
> > transaction.
> 
> How is this latter case (it doesn't need a crash anyway)
> different/better from the case where we detect that 'primary' is missing
> and ask admin if he wants to use the data on the secondary or not. At
> least, the admin has a choice with "method 1" and this doesn't have that
> choice.

If you ask the admin always if primary leg failed and wait for his action, 
you lose fault-tolerance --- the computer would wait until the admin does 
an action.

The requirements are:
* if one of legs fail or log fails, you must automatically continue 
without human intervention
* if both legs fail, you must shut it down and not pretend that something 
was written when it wasn't (this would break durability requirement of 
transactions).

Mikulas

> Thanks, Malahal.
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>