[libvirt] 'stack smashing detected' in 1.2.18 (caused by virNetDevGFeatureAvailable)

Laine Stump laine at laine.org
Wed Aug 5 18:19:18 UTC 2015


On 08/05/2015 12:09 PM, Brian Rak wrote:
> I recently compiled 1.2.18 to start testing with it, and was getting
> this error on startup:
>
> *** stack smashing detected ***: libvirtd terminated
> ======= Backtrace: =========
> /lib64/libc.so.6(__fortify_fail+0x37)[0x7fe1ac631527]
> /lib64/libc.so.6(__fortify_fail+0x0)[0x7fe1ac6314f0]
> //lib/libvirt.so.0(+0xa7927)[0x7fe1aeda2927]
> //lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0x947d)[0x7fe1958a047d]
>
> //lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0xa6c2)[0x7fe1958a16c2]
>
> //lib/libvirt/connection-driver/libvirt_driver_nodedev.so(+0xaf4e)[0x7fe1958a1f4e]
>
> //lib/libvirt.so.0(virStateInitialize+0xb8)[0x7fe1aee6d0a8]
> libvirtd(+0x15120)[0x7fe1afae6120]
> //lib/libvirt.so.0(+0xd4975)[0x7fe1aedcf975]
> /lib64/libpthread.so.0(+0x30316079d1)[0x7fe1ada8c9d1]
> /lib64/libc.so.6(clone+0x6d)[0x7fe1ac6178fd]
>
> (gdb) bt
> #0  0x00007ffff4a8f625 in raise () from /lib64/libc.so.6
> #1  0x00007ffff4a90e05 in abort () from /lib64/libc.so.6
> #2  0x00007ffff4acd537 in __libc_message () from /lib64/libc.so.6
> #3  0x00007ffff4b5f527 in __fortify_fail () from /lib64/libc.so.6
> #4  0x00007ffff4b5f4f0 in __stack_chk_fail () from /lib64/libc.so.6
> #5  0x00007ffff72d0927 in virNetDevGetFeatures (ifname=<value
> optimized out>, out=<value optimized out>) at util/virnetdev.c:3200
> #6  0x00007fffdddce47d in udevProcessNetworkInterface
> (device=0x7fffd4071f70, def=0x6) at node_device/node_device_udev.c:694
> #7  udevGetDeviceDetails (device=0x7fffd4071f70, def=0x6) at
> node_device/node_device_udev.c:1272
> #8  0x00007fffdddcf6c2 in udevAddOneDevice (device=0x7fffd4071f70) at
> node_device/node_device_udev.c:1394
> #9  0x00007fffdddcff4e in udevProcessDeviceListEntry
> (privileged=<value optimized out>, callback=<value optimized out>,
> opaque=<value optimized out>)
>     at node_device/node_device_udev.c:1433
> #10 udevEnumerateDevices (privileged=<value optimized out>,
> callback=<value optimized out>, opaque=<value optimized out>) at
> node_device/node_device_udev.c:1463
> #11 nodeStateInitialize (privileged=<value optimized out>,
> callback=<value optimized out>, opaque=<value optimized out>) at
> node_device/node_device_udev.c:1773
> #12 0x00007ffff739b0a8 in virStateInitialize (privileged=true,
> callback=0x555555569070 <daemonInhibitCallback>,
> opaque=0x5555557f1db0) at libvirt.c:777
> #13 0x0000555555569120 in daemonRunStateInit (opaque=<value optimized
> out>) at libvirtd.c:947
> #14 0x00007ffff72fd975 in virThreadHelper (data=<value optimized out>)
> at util/virthread.c:206
> #15 0x00007ffff5fba9d1 in start_thread () from /lib64/libpthread.so.0
> #16 0x00007ffff4b458fd in clone () from /lib64/libc.so.6
>
> In IRC, we tracked this down to this bit of code:
>
>     g_cmd.cmd = ETHTOOL_GFEATURES;
>     g_cmd.size = GFEATURES_SIZE;
>     if (virNetDevGFeatureAvailable(ifname, &g_cmd))
>         ignore_value(virBitmapSetBit(*out, VIR_NET_DEV_FEAT_TXUDPTNL));
>
> GFEATURES_SIZE is currently defined as 2, but this value needs to be
> higher in order to support newer kernels.  It looks like this code was
> added in ac3ed2085fcbeecaf5aa347c0b1bffaf94fff293
>
> ethtool calculates this value based on the number of supported
> features: http://lxr.free-electrons.com/source/net/core/ethtool.c#L55
>
> I don't know enough about this to properly fix this, but raising
> GFEATURES_SIZE to 3 has fixed this issue for me (though, this will
> obviously need to go higher as more features get added)

(in later IRC conversation, Brian noted that raising GFEATURES_SIZE
*didn't* always eliminate the issue...)


The problem goes beyond that:

1) as far as I can see, g_cmd.size needs to be set to the number of
items in the array g_cmd.feature, and we're setting it to 2
(GFEATURES_SIZE), but we have allocated space for exactly *0* items in
that array. If we're going to tell the kernel we have 2 items in the
array, we need to actually have that space available, or the kernel will
overwrite something else.

2) the feature we're looking for is called "TX_UDP_TNL" in libvirt, and
is manually #defined to be bit 25. From the title of the commit log for
the patch that added this code to libvirt, you can see what we want to
check for is the feature called "tx-udp_tnl-segmentation", and if you
look at the ethtool.c source from the kernel that Brian has linked to
above, you'll see that the
netdev_features_strings[NETIF_F_GSO_UDP_TUNNEL_BIT] is initialized to
"tx-upd_tnl-segmentation". When you look up NETIF_F_GSO_UDP_TUNNEL_BIT,
it seems to be *26* in the enum where it is defined:
http://osxr.org/linux/source/include/linux/netdev_features.h#0047

So are we checking for the wrong feature?

It would be "really nice" if we could avoid #defining magic values like
TX_UDP_TNL and GFEATURES_SIZE in our source. In my quick investigation I
couldn't see a way around that though (since NETIF_GSO_UDP_TUNNEL_BIT)
isn't available outside kernel source).

>
> This crash was occurring on a CentOS 6 system, running a the ELRepo
> kernel-ml kernel.  The stock CentOS 6 kernel (2.6.32) does not appear
> to have sufficient features available to trigger this.

I guess it would depend on whether or not ETHTOOL_GFEATURES is defined
for the 2.6 kernels and, if so, then what was being overwritten beyond
the end of g_cmd. (there are other locals defined both before and after
g_cmd; all are used only *before* g_cmd is used. I'm not sure what order
locals are in memory, so I don't know which are being overwritten.)




More information about the libvir-list mailing list