[libvirt RFC PATCH] network: NetworkManager script to monitor for conflicts with new interfaces

Laine Stump laine at redhat.com
Mon May 4 16:04:45 UTC 2020


There has been a problem for several years with libvirt's default
network conflicting with the host network connection on new installs,
particularly when the "host" is actually virtual machine that is,
itself, connected to the libvirt default network on its respective
host. If the two default networks use the same subnet, and if the
nested host's libvirtd happens to start its network before the system
networking connects to the toplevel host, then network connectivity to
the guest is just silently non-working.

We've tried several things over the years to eliminate this problem,
including:

1) Checking for conflicting routes/interfaces during installation of
the libvirt-daemon-config-network package (the one containing the
default network config file) which tries different subnets until it
finds one that doesn't conflict. This helps in some cases, but fails
on 2 points: a) if the installation environment is different from the
environment where the system is actually run (e.g. a "live CD" image
of a distro, which is built in a container by the distro maintainers,
then could be run in any number of places, and b) it modifies the
installed files during the rpm installation post_install script, which
is now discouraged because people building container images don't like
dealing with that.

2) When starting a libvirt network, we now check for any route or
interface that conflicts with the new network's IP's and routes. This
doesn't fix the problem, but does notify the user of the problem *as
long as libvirt starts its networks after the host has started its
system networks*.

Neither of these help in the situation where libvirt's networks are
started first. The script in this patch uses a NetworkManager facility
(dispatcher.d scripts, which are run whenever any interface is
brought up or down) to check for any libvirt networks that conflict
with the new NetworkManager interface, and if it finds a conflict it
logs a message and destroys the libvirt network. So as with case (2)
above, it doesn't solve the problem, but it does make it more obvious,
and also makes sure the host networking isn't borked, so you can still
ssh into the machine and fix it.

There are a few caveats/misgivings with the script I've come up with,
which cause me to post it as just an RFC:

1) It's written in python, and uses libxml2

When I first sat down to make this script (following a suggestion by
danpb during an IRC discussion), I immediately put #!/bin/bash at the
top of the file, because it's supposed to be a script. But then I
remembered that we're trying to narrow down the usage of languages in
libvirt, and anyway it would be nice to be able to properly parse the
XML to get the IP addresses/netmasks/prefixes. So I instead wrote it
in python. This makes for a less ugly program, but it does mean that
the daemon-config-network package is now dependent on python3-libxml
(and python3 of course), which pulls in a bunch of other packages.

Everybody's dev systems already have these packages and their
dependencies installed (since both are required to build) and many
other users systems already have them (they are required by
virt-install and virt-manager, among others), including the Fedora
live CD, for example. But not *all* systems have them. There has been
a lot of work to reduce package bloat caused by pulling in more and
more packages, so I'm reluctant to contribute to that. On the other
hand, someone who is looking to minimize their system footprint will
also probably not be installing the libvirt default network, so maybe
it's acceptable (I'm leaning towards "not", but want to know if anyone
else has a different opinion)

2) As delivered, it checks for conflicts with *all* libvirt
networks.

While that was fun to write, and in theory is the right
thing to do, in practice it may be / probably is overkill. As we
discussed in IRC, the main time when this is a problem is just after
the first boot of a new OS / libvirt installation, and the only
libvirt network that's going to exist at that time is the default
network. So while having a loop that scans all libvirt networks is
more complete, in almost all cases it is just wasting time. (I did
provide a bool at the top of the script that can be modified to tell
it to only check the default network, but of course the script is
installed in /usr/lib, and it's not proper to modify files in
/usr/lib, so that isn't a real-world solution, but more a way to allow
people (i.e. "me") right now to test out the different behaviors.

3) This only works if NetworkManager is enabled.

I brought this up when danpb first mentioned the idea, and he rightly
pointed out that every report of this problem we've had has been from
the first boot of a new install on a system that used Network Manager
- anybody using any other method of networking config on their host
has had to manually intervene to get it setup, but NM tries to do
everything auto-magically. So while there may still be a very
occasional rare occurence of a silent networking failure even with
this script installed, this should eliminate 99% of the cases.

So, the questions I have are:

1) Do we want to allow adding the dependency on python3-libxml to the
libvirt-daemon-config-network package in order to avoid adding a bash
script. Or should I just redo it in bash

2) Do we want something that checks all active networks as this does,
or just the default network (and if the latter, should we make it
easily modifiable to check all networks, or just strip it down as much
as possible)

3) I could eliminate the libxml2 dependency by just grepping the xml
for "<address .*ip=". Is reduced correctness (which is probably 0%
different in practice) worth the reduced package dependencies?

As a single question: on a sliding scale of "this script" ----->
"simple bash script" where do we want to land?

Signed-off-by: Laine Stump <laine at redhat.com>
---
 libvirt.spec.in                         |   2 +
 src/network/Makefile.inc.am             |   8 +-
 src/network/nm-dispatcher-check-nets.py | 182 ++++++++++++++++++++++++
 3 files changed, 191 insertions(+), 1 deletion(-)
 create mode 100755 src/network/nm-dispatcher-check-nets.py

diff --git a/libvirt.spec.in b/libvirt.spec.in
index 6abf97df85..e40068b2be 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -496,6 +496,7 @@ Requires: libvirt-libs = %{version}-%{release}
 Requires: dnsmasq >= 2.41
 Requires: radvd
 Requires: iptables
+Requires: python3-libxml2
 
 %description daemon-driver-network
 The network driver plugin for the libvirtd daemon, providing
@@ -1630,6 +1631,7 @@ exit 0
 %dir %attr(0755, root, root) %{_localstatedir}/lib/libvirt/dnsmasq/
 %attr(0755, root, root) %{_libexecdir}/libvirt_leaseshelper
 %{_libdir}/%{name}/connection-driver/libvirt_driver_network.so
+%{_prefix}/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
 
 %if %{with_firewalld_zone}
 %{_prefix}/lib/firewalld/zones/libvirt.xml
diff --git a/src/network/Makefile.inc.am b/src/network/Makefile.inc.am
index 196a30e16c..d58e09f88b 100644
--- a/src/network/Makefile.inc.am
+++ b/src/network/Makefile.inc.am
@@ -170,6 +170,9 @@ install-data-network:
 	( cd $(DESTDIR)$(confdir)/qemu/networks/autostart && \
 	  rm -f default.xml && \
 	  $(LN_S) ../default.xml default.xml )
+	$(MKDIR_P) "$(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d"
+	$(INSTALL_DATA) $(srcdir)/network/nm-dispatcher-check-nets.py \
+	  $(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
 if WITH_FIREWALLD_ZONE
 	$(MKDIR_P) "$(DESTDIR)$(prefix)/lib/firewalld/zones"
 	$(INSTALL_DATA) $(srcdir)/network/libvirt.zone \
@@ -189,7 +192,10 @@ endif WITH_FIREWALLD_ZONE
 
 endif WITH_NETWORK
 
-EXTRA_DIST += network/default.xml network/libvirt.zone
+EXTRA_DIST += \
+  network/default.xml \
+  network/nm-dispatcher-check-nets.py \
+  network/libvirt.zone
 
 .PHONY: \
 	install-data-network \
diff --git a/src/network/nm-dispatcher-check-nets.py b/src/network/nm-dispatcher-check-nets.py
new file mode 100755
index 0000000000..9225a08ea1
--- /dev/null
+++ b/src/network/nm-dispatcher-check-nets.py
@@ -0,0 +1,182 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2012-2019 Red Hat, Inc.
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+import libvirt
+import sys
+import os
+import libxml2
+from ipaddress import ip_network
+
+# This script should be installed in
+# /usr/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets. It will be
+# called by NetworkManager every time a network interface is taken up
+# or down. When a new network comes up, it checks the libvirt virtual
+# networks to see if their IP address(es) (including any static
+# routes) are in conflict with the IP address(es) (or static routes)
+# of the newly added interface.  If so, the libvirt network is
+# disabled. It is assumed that the user will notice that their guests
+# no longer have network connectvity (and/or the message logged by
+# this script), see that the network has been disabled, and then
+# realize the conflict when they try to restart it.
+#
+# set checkDefaultOnly=False to check *all* active libvirt networks
+# for conflicts with the new interface. Set to True to check only the
+# libvirt default network (since most networks other than the default
+# network are added post-install at a time when all of the hosts other
+# networks are already active, it may be overkill to check all of the
+# libvirt networks for conflict here (and instead just add more
+# needless overheard to bringing up a new host interface).
+#
+checkDefaultOnly = False
+
+# NB: since this file is installed in /usr/lib, it really shouldn't be
+# modified by the user, but instead should be copied to
+# /etc/NetworkManager/dispatcher.d, where it will override the copy in
+# /usr/lib. Even that isn't a proper solution though - if we're going
+# to actually have this config knob, perhaps we should check for it in
+# the environment, and if someone wants to modify it they can put a
+# short script in /etc that exports and environment variable and then
+# calls this script? Just thinking out loud here...
+
+def checkconflict(conn, netname, hostnets, hostif):
+
+    # ignore if the network has been brought down or removed since we
+    # got the list
+    try:
+        net = conn.networkLookupByName(netname)
+    except libvirt.libvirtError:
+        return
+
+    if not net.isActive():
+        return
+
+    xml = net.XMLDesc()
+    doc = libxml2.parseDoc(xml)
+    ctx = doc.xpathNewContext()
+
+    # see if NetworkManager is informing us that this libvirt network
+    # itself is coming online
+    bridge = ctx.xpathEval("/network/bridge/@name");
+    if bridge and bridge[0].content == hostif:
+        return
+
+    # check *all* the addresses of this network
+    addrs = ctx.xpathEval("/network/*[@address]")
+    for ip in addrs:
+        ctx.setContextNode(ip)
+        address = ctx.xpathEval("@address")
+        prefix = ctx.xpathEval("@prefix")
+        netmask = ctx.xpathEval("@netmask")
+
+        if not (address and len(address[0].content)):
+            continue
+
+        addrstr = address[0].content
+        if not (prefix and len(prefix[0].content)):
+            # check for a netmask
+            if not (netmask and len(netmask[0].content)):
+                # this element has address, but no prefix or netmask
+                # probably it is <mac address ...> so we can ignore it
+                continue
+            # convert netmask to prefix
+            prefixstr = str(ip_network("0.0.0.0/%s" % netmask[0].content).prefixlen)
+        else:
+            prefixstr = prefix[0].content
+
+        virtnetaddress = ip_network("%s/%s" % (addrstr, prefixstr), strict = False)
+        # print("network %s address %s" % (netname, str(virtnetaddress)))
+        for hostnet in hostnets:
+            if virtnetaddress == hostnet:
+                # There is a conflict with this libvirt network and the specified
+                # net, so we need to disable the libvirt network
+                print("Conflict with host net %s when starting interface %s - bringing down libvirt network '%s'"
+                      % (str(hostnet), hostif, netname))
+                try:
+                    net.destroy()
+                except libvirt.libvirtError:
+                    print("Failed to destroy network %s" % netname)
+                return
+    return
+
+
+def addHostNets(hostnets, countenv, addrenv):
+
+    count = os.getenv(countenv)
+    if not count or count == 0:
+        return
+
+    for num in range(int(count)):
+        addrstr = os.getenv("%s_%d" % (addrenv, num))
+        if not addrstr or addrstr == "":
+            continue
+
+        net = ip_network(addrstr.split()[0], strict=False)
+        if net:
+            hostnets.append(net)
+    return
+
+
+############################################################
+
+if sys.argv[2] != "up":
+    sys.exit(0)
+
+hostif = sys.argv[1]
+
+try:
+    conn = libvirt.open(None)
+except libvirt.libvirtError:
+    print('Failed to open connection to the hypervisor')
+    sys.exit(0)
+
+if checkDefaultOnly:
+    nets = []
+    net = conn.networkLookupByName("default")
+    if not (net and net.isActive()):
+        sys.exit(0)
+    nets.append(net)
+else:
+    nets = conn.listAllNetworks(libvirt.VIR_CONNECT_LIST_NETWORKS_ACTIVE)
+    if not nets:
+        sys.exit(0)
+
+# We have at least one active network. Build a list of all network
+# routes added by the new interface, and compare that list to the list
+# of all networks used by each active libvirt network. If any are an
+# exact match, then we have a conflict and need to shut down the
+# libvirt network to avoid killing host networking.
+
+# When NetworkManager calls scripts in /etc/NetworkManager/dispatcher.d
+# it will have all IP addresses and routes associated with the interface
+# that is going up or down in the following environment variables:
+#
+# IP4_NUM_ADDRESSES - number of IPv4 addresses
+# IP4_ADDRESS_N - one variable for each address, starting at _0
+# IP4_NUM_ROUTES - number of IPv5 routes
+# IP4_ROUTE_N - one for each route, starting at _0
+# (replace "IP4" with "IP6" and repeat)
+#
+hostnets = []
+addHostNets(hostnets, "IP4_NUM_ADDRESSES", "IP4_ADDRESS");
+addHostNets(hostnets, "IP4_NUM_ROUTES", "IP4_ROUTE");
+addHostNets(hostnets, "IP6_NUM_ADDRESSES", "IP6_ADDRESS");
+addHostNets(hostnets, "IP6_NUM_ROUTES", "IP6_ROUTE");
+
+for net in nets:
+
+    checkconflict(conn, net.name(), hostnets, hostif)
-- 
2.25.4




More information about the libvir-list mailing list