[libvirt] [PATCH] cgroup: Fix start VMs coincidently failed

Daniel P. Berrange berrange at redhat.com
Thu Mar 20 17:16:34 UTC 2014


On Thu, Mar 20, 2014 at 05:04:13PM +0100, Michal Privoznik wrote:
> On 20.03.2014 08:24, Wangyufei (James) wrote:
> >>From 0163328efa67da1d63e504c86e323db5affa378f Mon Sep 17 00:00:00 2001
> >From: Wang Yufei <james.wangyufei at huawei.com>
> >Date: Thu, 20 Mar 2014 07:14:01 +0000
> >Subject: [PATCH] cgroup: Fix start VMs coincidently failed
> >When I start multi VMs coincidently and any of the cgroup directories
> >named machine doesn't exist. There's a chance that VM start failed because
> >of creating directory failed:
> >Unable to initialize /machine cgroup: File exists
> >When the errno returned by mkdir in virCgroupMakeGroup is EEXIST,
> >we should pass it through and continue to start the VM.
> >Signed-off-by: Wang Yufei <james.wangyufei at huawei.com>
> >---
> >  src/util/vircgroup.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> >index c5925b1..a10d6f6 100644
> >--- a/src/util/vircgroup.c
> >+++ b/src/util/vircgroup.c
> >@@ -924,6 +924,10 @@ virCgroupMakeGroup(virCgroupPtr parent,
> >          if (!virFileExists(path)) {
> >              if (!create ||
> >                  mkdir(path, 0755) < 0) {
> >+                if (errno == EEXIST) {
> >+                    VIR_FREE(path);
> >+                    continue;
> >+                }
> >                  /* With a kernel that doesn't support multi-level directory
> >                   * for blkio controller, libvirt will fail and disable all
> >                   * other controllers even though they are available. So
> >
> 
> NACK. Prior to starting a domain we make sure that no historical
> cgroup is lying around. So if we don't remove the cgroup there
> that's the actual bug and this just shadows it. We can't guarantee
> anything if the old cgroup is not removed and the new one is created
> by us. However, we are not removing the stale cgroup in case of LXC
> only in QEMU. Is it LXC that you are seeing this error on?

I think there is actually a genuine race condition here, at least when
using systemd for cgroup management.

When we invoke "CreateMachine" in the systemd-machined DBus API, it will
only do the directory hierarchy /sys/fs/cgroup/systemd/some/sub/dir/guestname
If the other resource controllers are mounted seperately, libvirt then has
to manually create dirs /sys/fs/cgroup/{cpu,cpuacct,blkio,...}/some/sub/dir/guestname
I believe it is thus entirely possible for there to be a race in creating
the intermediate nods in this tree (ie the /some/sub/dir part) which may
be common to many guests.

When not using systemd, we require that the admin has pre-created the
/some/sub/dir part for all resource controllers, so we shouldn't have
a race in that non-systemd case.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list