[libvirt] [PATCH] Fix error report from nl_recvmsg

Daniel P. Berrange berrange at redhat.com
Thu Feb 28 16:37:58 UTC 2013


On Fri, Mar 01, 2013 at 12:31:34AM +0800, Daniel Veillard wrote:
> On Thu, Feb 28, 2013 at 04:24:17PM +0000, Daniel P. Berrange wrote:
> > On Thu, Feb 28, 2013 at 04:16:37PM +0000, Daniel P. Berrange wrote:
> [...]
> > Oh joy, it is worse than you could possibly imagine.
> > 
> > On libnl1 the return value is a valid -errno, while in libnl3
> > the return value is an error code of its own invention.
> > 
> > Further in libnl1 we can';t rely on the global errno, because
> > other calls libnl does may have overritten it with garbage.
> > We must use the return value from the function.
> > 
> > For yet more fun, libnl1's error handling is not threadsafe.
> > Whenever it hits an error with a syscall, it internally
> > runs  __nl_error() which mallocs/frees a static global
> > variable containing the contents of strerror() which is
> > itself also not threadsafe :-(
> > 
> > Did I mention we should just throw out all versions of libnl
> > entirely and talk to the kernel ourselves..... It has caused
> > us no end of pain in all its versions.
> 
>   No chance of educating them instead ? We can't rewrite everything :)

Sure, it has been getting better over time, but that doesn't help us
for all existing distros, particular rhel-5 and rhel-6 which libvirt
is going to be crash-prone due to unsolvable libnl design flaws in
those versions.

Looking at the code there are two basic sets of APIs we rely on 

 nl_XXXX
 nla_XXXX

The nl_XXX APis are basically just wrappers around the normal socket()
based APIs, hiding a few bits about the AF_NETLINK socket type. It
would be trivially to do all that work ourselves, since socket() handling
is nothing special. These are the APIs which have caused us multiple
thread safety crash problems over the years.

The nla_XXX APIs are all about complex data formatting, and we wouldn't
want to try todo that ourselves. Fortunately the nla_XXX APIs are not
the ones that are causing us trouble - AFAICT those look pretty safe
in what they do fro a thread POV, since they're all just working on the
object instances you pass in, no global state.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list