[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] Report of tsan first testing with libvirt(d)



On Fri, May 18, 2012 at 11:55:05AM +0800, Daniel Veillard wrote:
> One of the  ouput raised frequently is:
> ----------------------------------------------------------------------
> ==5838== WARNING: Possible data race during write of size 4 at 0x34723B3BD0: {{{
> ==5838==    T1 (L{L17, L125}):
> ==5838==     #0  ptmalloc_unlock_all2 /lib64/libc-2.14.90.so
> ==5838==     #1  fork /lib64/libc-2.14.90.so
> ==5838==     #2  virFork /usr/src/debug/libvirt-0.9.12/src/util/command.c:268
> ==5838==     #3  virExecWithHook.constprop.5 /usr/src/debug/libvirt-0.9.12/src/util/command.c:489
> ==5838==     #4  virCommandRunAsync /usr/src/debug/libvirt-0.9.12/src/util/command.c:2177
> ==5838==     #5  virCommandRun /usr/src/debug/libvirt-0.9.12/src/util/command.c:1970
> ==5838==     #6  qemuCapsExtractVersionInfo /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1379
> ==5838==     #7  qemuCapsExtractVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1568
> ==5838==     #8  qemudGetVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1320
> ==5838==     #9  virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640
> ==5838==     #10 remoteDispatchGetVersionHelper /usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862
> ==5838==     #11 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
> ==5838==   Concurrent read(s) happened at (OR AFTER) these points:
> ==5838==    T11 (L{}):
> ==5838==     #0  arena_thread_freeres /lib64/libc-2.14.90.so
> ==5838==     #1  __libc_thread_freeres /lib64/libc-2.14.90.so
> ==5838==     #2  start_thread /lib64/libpthread-2.14.90.so
> ==5838==   Address 0x34723B3BD0 is 0 bytes inside data symbol "list_lock"
> ==5838==   Locks involved in this report (reporting last lock sites): {L17, L125}
> ==5838==    L17 (0x5339B00)
> ==5838==     #0  pthread_mutex_lock /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935
> ==5838==     #1  virFork /usr/src/debug/libvirt-0.9.12/src/util/command.c:266
> ==5838==     #2  virExecWithHook.constprop.5 /usr/src/debug/libvirt-0.9.12/src/util/command.c:489
> ==5838==     #3  virCommandRunAsync /usr/src/debug/libvirt-0.9.12/src/util/command.c:2177
> ==5838==     #4  virCommandRun /usr/src/debug/libvirt-0.9.12/src/util/command.c:1970
> ==5838==     #5  qemuCapsExtractVersionInfo /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1379
> ==5838==     #6  qemuCapsExtractVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_capabilities.c:1568
> ==5838==     #7  qemudGetVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1320
> ==5838==     #8  virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640
> ==5838==     #9  remoteDispatchGetVersionHelper /usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862
> ==5838==     #10 virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
> ==5838==     #11 virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
> ==5838==    L125 (0x4C0A0A00)
> ==5838==     #0  pthread_mutex_lock /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935
> ==5838==     #1  qemudGetVersion /usr/src/debug/libvirt-0.9.12/src/qemu/qemu_driver.c:1319
> ==5838==     #2  virConnectGetVersion /usr/src/debug/libvirt-0.9.12/src/libvirt.c:1640
> ==5838==     #3  remoteDispatchGetVersionHelper /usr/src/debug/libvirt-0.9.12/daemon/remote_dispatch.h:6862
> ==5838==     #4  virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
> ==5838==     #5  virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
> ==5838==     #6  virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
> ==5838==     #7  virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
> ==5838==    Race verifier data: 0x347207CF98,0x347215D6BB
> ==5838== }}}
> 
>   I don't know what "Race verifier data" means but it seems that we do
> *pid = fork(); in the main thread and that another thread is accessing
> without synchronization. The probram properly report that 2 locks are
> held at that point, the virLogLock() and qemuDriverLock() and give full
> stack trace on where those locks were taken. That's convenient but
> I could not really spot where the second thread did the access, I suspect
> that's somehow related to libpthread considering the minimal stack trace
> of the second thread T11 ...

No, those first 2 stack traces are refering to the locks inside GLibC's
malloc() implementation, so not something in our code. THis is bogus
I believe.

>   I however found out a couple of issues not related to libpthread:
> 
> ==5733== WARNING: Possible data race during write of size 1 at 0x34723B6B14: {{{
> ==5733==    T5 (L{}):
> ==5733==     #0  Replace_memcpy /mnt/data/build/slave/full_linux_build/build/tsan/ts_replace.h:113
> ==5733==     #1  __GI_memcpy /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:2359
> ==5733==     #2  xdrmem_getbytes /lib64/libc-2.14.90.so
> ==5733==     #3  xdr_string /lib64/libc-2.14.90.so
> ==5733==     #4  xdr_remote_nonnull_string /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:31
> ==5733==     #5  xdr_remote_nonnull_domain /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:58
> ==5733==     #6  xdr_remote_domain_get_info_args /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:1390
> ==5733==     #7  virNetMessageDecodePayload /usr/src/debug/libvirt-0.9.12/src/rpc/virnetmessage.c:382
> ==5733==     #8  virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:403
> ==5733==     #9  virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
> ==5733==     #10 virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
> ==5733==     #11 virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
> ==5733==   Concurrent write(s) happened at (OR AFTER) these points:
> ==5733==    T2 (L{}):
> ==5733==     #0  xdrmem_getbytes /lib64/libc-2.14.90.so
> ==5733==     #1  xdr_string /lib64/libc-2.14.90.so
> ==5733==     #2  xdr_remote_nonnull_string /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:31
> ==5733==     #3  xdr_remote_nonnull_domain /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:58
> ==5733==     #4  xdr_remote_domain_get_xml_desc_args /usr/src/debug/libvirt-0.9.12/daemon/../src/remote/remote_protocol.c:1532
> ==5733==     #5  virNetMessageDecodePayload /usr/src/debug/libvirt-0.9.12/src/rpc/virnetmessage.c:382
> ==5733==     #6  virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:403
> ==5733==     #7  virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
> ==5733==     #8  virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
> ==5733==     #9  virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
> ==5733==   Address 0x34723B6B14 is 0 bytes inside data symbol "crud.6424"
> ==5733==    Race verifier data: 0x4A091CE,0x347212579D
> ==5733== }}}
> 
>   that one could potentially be nasty, it seems that 2 threads decoding
> rpc payload in parallel hit an unsynchronized write access in
> the xdr routine for xdr_remote_nonnull_string() . I didn't had the
> debuginfo for glibc installed when I ran it so I don't have the
> exact informations at the top of the stacks

This looks bogus to me too. These are two completely separate RPC messages
being dispatched here (virDomainGetInfo and virDomainGetXMLDesc). If they
truely pointed to the same data, we'd be going SEGV tastic all over the
place.

> 
>   The other one raised is actually something I had initially found
> with the first run on libxml2 so it's relatively consistent:

Yay libxml2 is not thread safe !

      "The  function  rand() is not reentrant or thread-safe, since
       it uses hidden state that is modified on  each  call.   This
       might just be the seed value to be used by the next call, or
       it might be something  more  elaborate.   In  order  to  get
       reproducible  behavior in a threaded application, this state
       must be made explicit; this can be done using the  reentrant
       function rand_r()."

You should probably validate libxml2 against all the functions
have blacklisted for libvirt in Makefile.nonreentrant

> 
> ==5733== WARNING: Possible data race during read of size 4 at 0x34723B10B4: {{{
> ==5733==    T1 (L{L173}):
> ==5733==     #0  random_r /lib64/libc-2.14.90.so
> ==5733==     #1  random /lib64/libc-2.14.90.so
> ==5733==     #2  rand /lib64/libc-2.14.90.so
> ==5733==     #3  xmlDictCreate /usr/src/debug/libxml2-2.8.0/dict.c:491
> ==5733==     #4  xmlInitParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1620
> ==5733==     #5  xmlNewParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1874
> ==5733==     #6  xmlCreateURLParserCtxt /usr/src/debug/libxml2-2.8.0/parser.c:13546
> ==5733==     #7  xmlReadFile /usr/src/debug/libxml2-2.8.0/parser.c:14713
> ==5733==     #8  xmlRelaxNGParse /usr/src/debug/libxml2-2.8.0/relaxng.c:7480
> ==5733==     #9  (no symbols) /usr/lib64/libnetcf.so.1.4.0
> ==5733==     #10 ncf_init /usr/lib64/libnetcf.so.1.4.0
> ==5733==     #11 interfaceOpenInterface /usr/src/debug/libvirt-0.9.12/src/interface/netcf_driver.c:144
> ==5733==   Concurrent write(s) happened at (OR AFTER) these points:
> ==5733==    T10 (L{L156}):
> ==5733==     #0  random_r /lib64/libc-2.14.90.so
> ==5733==     #1  random /lib64/libc-2.14.90.so
> ==5733==     #2  rand /lib64/libc-2.14.90.so
> ==5733==     #3  xmlHashCreate /usr/src/debug/libxml2-2.8.0/hash.c:193
> ==5733==     #4  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:4125
> ==5733==     #5  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6380
> ==5733==     #6  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523
> ==5733==     #7  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6319
> ==5733==     #8  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6520
> ==5733==     #9  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523
> ==5733==   Address 0x34723B10B4 is 20 bytes inside data symbol "randtbl"
> ==5733==   Locks involved in this report (reporting last lock sites): {L156, L173}
> ==5733==    L156 (0x402E790)
> ==5733==     #0  pthread_mutex_lock /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935
> ==5733==     #1  remoteDispatchOpenHelper /usr/src/debug/libvirt-0.9.12/daemon/remote.c:683
> ==5733==     #2  virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
> ==5733==     #3  virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
> ==5733==     #4  virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
> ==5733==     #5  virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
> ==5733==    L173 (0x402F420)
> ==5733==     #0  pthread_mutex_lock /mnt/data/build/slave/full_linux_build/build/tsan/ts_valgrind_intercepts.c:935
> ==5733==     #1  remoteDispatchOpenHelper /usr/src/debug/libvirt-0.9.12/daemon/remote.c:683
> ==5733==     #2  virNetServerProgramDispatch /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserverprogram.c:416
> ==5733==     #3  virNetServerHandleJob /usr/src/debug/libvirt-0.9.12/src/rpc/virnetserver.c:161
> ==5733==     #4  virThreadPoolWorker /usr/src/debug/libvirt-0.9.12/src/util/threadpool.c:144
> ==5733==     #5  virThreadHelper /usr/src/debug/libvirt-0.9.12/src/util/threads-pthread.c:161
> ==5733==    Race verifier data: 0x347203A572,0x347203A565
> ==5733== }}}
> 
> 
>   I didn't know that netcf was doing Relax-NG validation on-the-fly :)
> and hit another  in libxml2 though libvirt/netcf/libxslt/libxml2
> call:
> 
> ==5733== WARNING: Possible data race during read of size 4 at 0x34723B10B4: {{{
> ==5733==    T1 (L{L173}):
> ==5733==     #0  random_r /lib64/libc-2.14.90.so
> ==5733==     #1  random /lib64/libc-2.14.90.so
> ==5733==     #2  rand /lib64/libc-2.14.90.so
> ==5733==     #3  xmlDictCreate /usr/src/debug/libxml2-2.8.0/dict.c:491
> ==5733==     #4  xmlInitParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1620
> ==5733==     #5  xmlNewParserCtxt /usr/src/debug/libxml2-2.8.0/parserInternals.c:1874
> ==5733==     #6  xmlCreateURLParserCtxt /usr/src/debug/libxml2-2.8.0/parser.c:13546
> ==5733==     #7  xmlReadFile /usr/src/debug/libxml2-2.8.0/parser.c:14713
> ==5733==     #8  xmlRelaxNGParse /usr/src/debug/libxml2-2.8.0/relaxng.c:7480
> ==5733==     #9  (no symbols) /usr/lib64/libnetcf.so.1.4.0
> ==5733==     #10 ncf_init /usr/lib64/libnetcf.so.1.4.0
> ==5733==     #11 interfaceOpenInterface /usr/src/debug/libvirt-0.9.12/src/interface/netcf_driver.c:144
> ==5733==   Concurrent write(s) happened at (OR AFTER) these points:
> ==5733==    T10 (L{L156}):
> ==5733==     #0  random_r /lib64/libc-2.14.90.so
> ==5733==     #1  random /lib64/libc-2.14.90.so
> ==5733==     #2  rand /lib64/libc-2.14.90.so
> ==5733==     #3  xmlHashCreate /usr/src/debug/libxml2-2.8.0/hash.c:193
> ==5733==     #4  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:4125
> ==5733==     #5  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6380
> ==5733==     #6  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523
> ==5733==     #7  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6319
> ==5733==     #8  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6520
> ==5733==     #9  xmlRelaxNGCheckRules /usr/src/debug/libxml2-2.8.0/relaxng.c:6523
> ==5733==   Address 0x34723B10B4 is 20 bytes inside data symbol "randtbl"
> ==5733==   Locks involved in this report (reporting last lock sites): {L156, L173}
> 
>   Looks like another libxml2 bug calling rand, I should use rand_r() instead
> if available.

Yep.

>   So except for the parallel xdr_remote_nonnull_string() issue
> I don't think there is anything  I can report at this point for libvirt
> code itself.

I think that is bogus.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]