[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: [dm-devel] more multipath deadlocks -- this time involving memory
- From: christophe varoqui <christophe varoqui free fr>
- To: device-mapper development <dm-devel redhat com>
- Cc: 'Alasdair G Kergon' <agk redhat com>
- Subject: Re: [dm-devel] more multipath deadlocks -- this time involving memory
- Date: Thu, 24 Mar 2005 00:10:02 +0100
Just to let you know I'm not ignoring your comments and analysis.
I opened the 0.4.4-pre* festival, and hope we can fix these nasties
before the end of this cycle.
I started the branch with an id cache in multipath. I'm not sure on the
design, so I will take comments.
As seen with Lars, I'll then continue moving bits from multipath/ to
libmultipath/ until the daemon can be switched to
libmultipath:multipath() instead of exec(/sbin/multipath).
That, plus a loging rework that is under discussion with open-iscsi
guys, should address most of your concerns.
A mempool would have to wait for another release, if at all desirable.
On lun, 2005-03-21 at 21:34 -0500, goggin, edward wrote:
> Looks like some troublesome deadlock issues involving multipath, memory,
> and all-paths-down use cases. While one might typically expect such a use
> case to result in errors, deadlock is not to be expected. Furthermore, for
> destructive ucode upgrades of an EMC CLARiion storage system, it is expected
> that for a short period of time, all paths to the storage system in question
> appear to a host to be failed. It is expected that any multipathing
> solution will
> ride through this NDU scenario without a problem.
> While I see three separate instances of the problem being plausible, I have
> seen the first problem instance described below. The second and third
> require high levels of memory contention which I have not spent significant
> The first problem scenario involves a deadlock between multipathd and
> syslogd. The second scenario involves the potential for multipathd,
> or any of the executables invoked by multipath to be deadlocked trying doing
> synchronous page reclamation while allocating memory pages for user or
> kernel heap memory in a system with a high degree of memory contention
> and several multipath mapped devices in an all-paths-down failure state.
> The third scenario is extremely similar to the second but involves the need
> to allocate pages not for heap memory but to swap in working set pages for
> the multipathd, multipath, or any of the executables invoked by multipath.
> First, it seems like __every__ time I try an NDU of the EMC CLARiion ucode,
> one of the two (checkerloop or waiterloop) multipathd sub-threads gets
> blocked in unix_wait_for_peer waiting to send a syslog message through
> a UNIX domain socket to syslogd. Unfortunately, syslogd is blocked in
> blk_congestion_wait waiting for the number of dirty pages in the page
> cache to drop below a pre-defined threshold while it was trying to write log
> info to its /var/log/messages log file. Unfortunately, getting this to
> is dependent on the multipathd checkerloop thread periodically checking
> path connectivity and invoking multipath in order to reconfigure multipath
> maps and/or re-enable some now valid paths. Since the multipathd
> waiterloop event thread will deadlock on the multipathd allpaths mutex
> currently owned by the checkerloop thread, starting i/o on a failed path
> will not free up the log jam. Assuming enough free memory is available
> to do so, manually running multipath often resolves the problem. Yet,
> this is hardly a work around to be recommended to an enterprise customer.
> I have only been able to avoid this deadly embrace by killing syslogd
> before starting the test. Without syslogd running, I made it through
> this test 3 consecutive times. It seems that I cannot get through the test
> at all with syslogd running. I think simply changing syslogd to do direct
> instead of page cache buffered i/o to its log file(s) will avoid this
> problem. I am
> running with 2.6.11-rc3-upd2 kernel and 0.4.3-pre9 multipath tools by the
> The 2nd scenario involves blockable user or kernel memory allocation
> requiring page write-out of dirty pages on multipath mapped devices in the
> synchronous page reclaim algorithm of __alloc_pages. Seems to me that while
> mlockall can pin all current and future pages of a process's working set, it
> not prevent synchronous page reclamation by the process as part of a
> page allocation request. If many of the mapped devices are in queue
> mode due to failed paths on a storage system which is queuing failed bios
> the EMC CLARiion must), multipathd, multipath, or any executable invoked by
> multipath could block trying to page out dirty pages to these mapped devices
> while trying to allocate memory before being able to inform the kernel
> multipath components of the existence of valid paths for these mapped
> The 3rd scenario involves the need to mlockall for all executables which are
> invoked by multipathd (multipath) and the executables invoked by these
> executables (scsi_id, /bin/false, ...). Otherwise, any of these executables
> block during page reclamation while trying to allocate free pages. Also,
> does the
> effect of mlockall survive in the parent beyond fork/clone call or does it
> need to be
> renewed afterwards?
> Overall, it seems like the code path to test and restore a target path of a
> mapped device should not require any blockable memory allocations. This
> of course rule out fork/clone/exec. A possible alternative design to the
> current one
> is to pre-allocate or reserve the memory requirements for these tasks --
> enough memory for testing and restoring a single path to a single LU at a
> While this design would be tuned specifically to this job, I think it would
> not need
> to be kernel resident.
> dm-devel mailing list
> dm-devel redhat com
christophe varoqui <christophe varoqui free fr>
[Date Prev][Date Next] [Thread Prev][Thread Next]