[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] udevadm settle can take too long

[ CC to Cole ]

Osier Yang wrote:
On 2012年04月24日 03:47, Guido Günther wrote:
On Sun, Apr 22, 2012 at 02:41:54PM -0400, Jim Paris wrote:

http://bugs.debian.org/663931 is a bug I'm hitting, where virt-manager
times out on the initial connection to libvirt.

I reassigned the bug back to libvirt. I still wonder what triggers this
though for some users but not for others?
  -- Guido

The basic problem is that, while checking storage volumes,	
virt-manager causes libvirt to call "udevadm settle".  There's an
interaction where libvirt's earlier use of network namespaces (to probe
LXC features) had caused some uevents to be sent that get filtered out
before they reach udev.  This confuses "udevadm settle" a bit, and so
it sits there waiting for a 2-3 minute built-in timeout before returning.
Eventually libvirtd prints:
   2012-04-22 18:22:18.678+0000: 30503: warning : virKeepAliveTimer:182 : No response from client 0x7feec4003630 after 5 keepalive messages in 30 seconds
and virt-manager prints:
   2012-04-22 18:22:18.931+0000: 30647: warning : virKeepAliveSend:128 : Failed to send keepalive response to client 0x25004e0
and the connection gets dropped.

One workaround could be to specify a shorter timeout when doing the
settle.  The patch appended below allows virt-manager to work,
although the connection still has to wait for the 10 second timeout
before it succeeds.  I don't know what a better solution would be,
though.  It seems the udevadm behavior might not be considered a bug
>from the udev/kernel point of view:

I'm using Linux 3.2.14 with libvirt 0.9.11.  You can trigger the
udevadm issue using a program I posted at the Debian bug report link


 From 17e5b9ebab76acb0d711e8bc308023372fbc4180 Mon Sep 17 00:00:00 2001
From: Jim Paris<jim jtan com>
Date: Sun, 22 Apr 2012 14:35:47 -0400
Subject: [PATCH] shorten udevadmin settle timeout

Otherwise, udevadmin settle can take so long that connections from
e.g. virt-manager will get closed.
  src/util/util.c |    4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/util/util.c b/src/util/util.c
index 6e041d6..dfe458e 100644
--- a/src/util/util.c
+++ b/src/util/util.c
@@ -2593,9 +2593,9 @@ virFileFindMountPoint(const char *type ATTRIBUTE_UNUSED)
  void virFileWaitForDevices(void)
  # ifdef UDEVADM
-    const char *const settleprog[] = { UDEVADM, "settle", NULL };
+    const char *const settleprog[] = { UDEVADM, "settle", "--timeout", "10", NULL };

Though I don't have a good idea to fix it either, I guess this
change could cause "lvremove" to fail again for the udev race.

See BZs:


It seems that those bugs were caused by something like

1. open(lv, O_RDWR)
2. close(lv)
3. system("lvremove ...")

where udev would fire off a command between 2 and 3 that caused 3 to
fail.  Adding "udevadm settle" as step 2.5 is a good way to wait for
that command to finish, but:

- it doesn't necessarily fix the issue; something could easily re-open
   the device between 2.5 and 3 and cause the same failure.


- the race condition sounds like it was a short window, and sometimes
   the original sequence would still work even without the settle.
   That would suggest to me that a timeout of 10s is still plenty long.

A few thoughts:

- For lvremove: can we try a short timeout (3 seconds), then if the
   lvremove still fails, try again with the default udevadm timeout
   (120 seconds)?

- Even in that case, we need to fix libvirtd to not kill the
   connection after 30 seconds when it's libvirtd's fault that the
   connection is blocked for so long anyway.

perhaps we need a timeout property for the client connection,
but not hardcode to 30s.

- When connecting with virt-manager, is the udevadm settle really
   necessary?  We're not calling lvremove.

virt-manager's hung should be caused by pool refresh, which
uses "udevadm settle" to wait for the new devices show up. So
it doesn't relates with "lvremove".

Except logical storage, storage type of "disk", "scsi", and
"mpath" uses "udevadm settle" too. And node device driver.

Generally the pool refresh will be involked when libvirtd starts,
and surely another case is it's involked explicitly. :-) I.e.
virt-manager can't be hung if it doesn't intent to refresh the
pool. And thus I guess the situation will be much worse if pools
of "disk", "logical", "scsi", "mpath" exists all together.

I'm wondering if virt-manager try to refresh the pools when
it starts, or when user request to "check storage" explicitly,
(e.g. clicking some button). It should be improved if it's the
first case IMHO, (let the user get the connection, and refresh
the pool when neccessary could be better).

I'd agree with that introducing timeout argument for "udevadm
settle" will be better, but "hardcode" a timeout in
"virFileWaitForDevices" is not good, as we can see, it's used
many places, what is the proper timeout for each of them can be
a question.

And on the other hand, even small timeouts are introduced, it's
still possible to hang for a long time while "checking storage"
(refresh the pools all together). I have not much idea about how
to get rid of it totally. But how about only refresh the pool
selected by user? the max waiting time in this case will be 2min,
but not ($num_of_pools * 2)mins.

@cole, any thought?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]