[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [dm-devel] 2.6.2-udm2

On Wed, Feb 18, 2004 at 01:17:14PM +0100, Lars Marowsky-Bree wrote:
> On 2004-02-18T13:01:29,
>    Heinz Mauelshagen <mauelshagen redhat com> said:
> > > - The test interval is no longer passed in with the target
> > > parameters (tools writers take note.
> > > 
> > > - The kernel now does no path testing at all, let userland do it
> > > (see thread on lkml).
> > 
> > I'm not convinced that we can recover from OOM situations with either
> > approach (userpace testing or dm testing of failed paths).
> I'm thinking that if you have run yourself into such a failure - ie all
> paths currently down, swap on the failed m-p device, OOM _and_ needing
> to allocate memory / swapping - the system is in a very very sick state
> anyway. Handling it perfectly may just not be possible.

Of course it is in a pretty sick state at that point in time.

But it is in a pretty sick state the admin wants to get it out of
by repairing at least one path and reusing it.
Even though the system would be able to run IO through that path,
we wouldn't have a chance activating it, which is bad.

> > But if *all* paths of the multipath target to test are failed *and*
> > the system is OOM, the driver accessed to queue the test io can sleep
> > on allocating memory (either calling [kv]malloc() directly or
> > indirectly through mempools).
> > 
> > That memory allocation is in danger to deadlock, because pageouts are
> > needed involving the very multipathed target we want to unfail.
> > 
> > The 'workaround' for this which is reloading the table in order to set
> > all paths to operational again would involve memory allocation as well
> > :(
> Yes. I actually see no way around this, except to tap into a general
> 'emergency' memory pool. I thought there was something like it, but I
> forgot the name ;-) Doesn't pvmove use the same?

No, we use it to avoid low memory situations while we suspend devices.

Heinz    -- The LVM Guy --

> Sincerely,
>     Lars Marowsky-Brée <lmb suse de>
> -- 
> High Availability & Clustering	      \ ever tried. ever failed. no matter.
> SUSE Labs			      | try again. fail again. fail better.
> Research & Development, SUSE LINUX AG \ 	-- Samuel Beckett
> --
> dm-devel mailing list
> dm-devel redhat com
> https://www.redhat.com/mailman/listinfo/dm-devel

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***


Heinz Mauelshagen                                 Red Hat, Inc.
Consulting Development Engineer                   Am Sonnenhang 11
                                                  56242 Marienrachdorf
Mauelshagen RedHat com                            +49 2626 141200
                                                       FAX 924446

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]