[libvirt] Loosing lxc guests when restarting libvirt

Guido Günther agx at sigxcpu.org
Sat Dec 24 16:14:44 UTC 2016


Hi Cedric,x
On Wed, Dec 21, 2016 at 02:36:39PM +0100, Cedric Bosdonnat wrote:
> Hey Christian,
> 
> On Tue, 2016-12-20 at 12:29 +0100, Christian Ehrhardt wrote:
> > Hi,
> > I found an issue in libvirt related to libvirt-lxc, but fail to find the root cause.
> > 
> > The TL;DR is: libvirt-lxc guests get killed on libvirt restart due to "internal error: No valid cgroup for machine"
> > 
> > It was able to reproduce libvirt 1.3.1, 2.4 and 2.5 as packages in Ubuntu and Debian.
> > I wanted to ask for two things:
> > - wider coverage where this does reproduce
> 
> I couldn't reproduce here with openSUSE Tumbleweed and libvirt 2.5 packages.

I had a short look and it seems like this sequence is killing all running
libvirt-lxc guests reliably:

  # no lxc guest running yet
  export LIBVIRT_DEFAULT_URI=lxc:///
  DOMAIN=sl
  systemctl daemon-reload

  # start lxc guest
  virsh start ${DOMAIN}
  sleep 1  # give vm some time to start
  systemctl restart libvirtd
  virsh list | grep -qs "${DOMAIN}[[:space:]]\+running"
  # lxc guest gone

The important part is the "systemctl daemon-reload". If one leaves that
out libvirtd restarts don't kill off any lxc-domains anymore.

The issue is that libvirt on reattach fails virCgroupNewDetectMachine
due to /proc/<pid-of-lxc-container>/cgroup having changed after
libvird's restart:

Before systemctl restarts libvirtd:

10:perf_event:/machine/lxc-21383-sl.libvirt-lxc
9:cpuset:/machine/lxc-21383-sl.libvirt-lxc
8:net_cls,net_prio:/machine/lxc-21383-sl.libvirt-lxc
7:pids:/system.slice/libvirtd.service
6:memory:/machine/lxc-21383-sl.libvirt-lxc
5:cpu,cpuacct:/machine/lxc-21383-sl.libvirt-lxc
4:devices:/machine/lxc-21383-sl.libvirt-lxc
3:freezer:/machine/lxc-21383-sl.libvirt-lxc
2:blkio:/machine/lxc-21383-sl.libvirt-lxc
1:name=systemd:/system.slice/libvirtd.service

After systemctl restart libvirtd:

10:perf_event:/machine/lxc-21383-sl.libvirt-lxc
9:cpuset:/machine/lxc-21383-sl.libvirt-lxc
8:net_cls,net_prio:/machine/lxc-21383-sl.libvirt-lxc
7:pids:/system.slice/libvirtd.service
6:memory:/system.slice/libvirtd.service
5:cpu,cpuacct:/system.slice/libvirtd.service
4:devices:/system.slice/libvirtd.service
3:freezer:/machine/lxc-21383-sl.libvirt-lxc
2:blkio:/system.slice/libvirtd.service
1:name=systemd:/system.slice/libvirtd.service

so the process is moved to other memory, cpu, device and blkio cgroups
and therefore libvirtd can't find it anymore. The error in the log looks
like:

debug : virCgroupValidateMachineGroup:333 : Name 'libvirtd.service' for controller 'cpu' does not match 'sl', 'lxc-21383-sl', 'sl.libvirt-lxc', 'machine-lxc\x2dsl.scope' or 'machine-lxc\x2d21383\x2dsl.scope'

This does _not_ happen if one restarts libvirtd right after the "systemctl
daemon-reload" or if one drops the "systemctl daemon-reload" from the above
example. This also does not happen if one stops libvird via systemd but
starts it as /usr/sbin/libvirtd directly. So the culprit happens when

* systemctl daemon-reload
* libvirtd is restared via systemctl

I've looked at audit logs and straced pid 1 without spotting
anything. Any ideas where to go looking now?

This is systemd 232.

Cheers,
 -- Guido


> 
> > - your expertise on the case itself.
> 
> It seems that you'll need to check what's going on in virCgroupDetect().
> 
> > Steps to reproduce:
> > 1. Spawn new KVM Guest of your choice
> > 2. install test dependencies
> > $ apt-get install libvirt-daemon-system libvirt-clients libxml2-utils
> > # or package managers / package names of your chosen os
> > 3. run the following sequence as root
> > export LIBVIRT_DEFAULT_URI=lxc:///
> > cat << EOF > /tmp/smoke-lxc.xml
> > <domain type='lxc'>
> >   <name>sl</name>
> >   <memory unit='KiB'>256000</memory>
> >   <currentMemory unit='KiB'>256000</currentMemory>
> >   <vcpu placement='static'>1</vcpu>
> >   <os>
> >     <type>exe</type>
> >     <init>/bin/bash</init>
> >   </os>
> >   <features>
> >     <privnet/>
> >   </features>
> >   <clock offset='utc'/>
> >   <devices>
> >     <emulator>/usr/lib/libvirt/libvirt_lxc</emulator>
> 
> The emulator should be removed from the config for portability
> purpose: the libvirt_lxc path may vary from a distro / arch to another
> and libvirt's lxc driver is able to auto-add it.
> 
> --
> Cedric
> 
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list




More information about the libvir-list mailing list