[dm-devel] 2.6.2-udm2

Wed Feb 18 13:25:02 UTC 2004

On 2004-02-18T09:38:38,
   Patrick Mansfield <patmans at us.ibm.com> said:

> Yes, but memory starvation (that will be recovered from) is better than an
> oops, and in some cases better than application downtime.

Iff it will be recovered from later (and quickly enough before we die
completely anyway).

In such scenarios, quick failover to another node may be the best
option, and that means reporting the error upwards ASAP, and not
delaying it for the length of some arbitary timeout.

It's a policy decision though, but I'd repeat my point that the policy
of blocking isn't the best answer.

And that can be divided into two aspects, even:

- Queuing IO while all paths are down may be a route for some systems.
  If you have the memory to queue, that is, and don't want quick
  failover. This should be doable.

- Queuing IO in OOM situations and having the swap on the affected m-p
  device, now that is causing more pain than it will solve.

I recall that there was some storage system which actively caused such
broken scenarios because of it's 'rolling upgrade' semantics. But, I
think that's a problem they ought to fix in firmware and not in the OS
;-)

Sincerely,
    Lars Marowsky-Brée <lmb at suse.de>

-- 
High Availability & Clustering	      \ ever tried. ever failed. no matter.
SUSE Labs			      | try again. fail again. fail better.
Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett