[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] Two core dumps are generated in multi-thread scenarios



2012/9/23 Benjamin Wang (gendwang) <gendwang cisco com>:
> Hi,
>   I found two core dumps generated in multi-thread scenarios in ESX part.
>
> Case1: libcurl support multi-thread
> core dump:
> #12 0x00002aaabea89712 in addbyter () from /usr/local/lib/libcurl.so.4
> #13 0x00002aaabea89b86 in dprintf_formatf () from
> /usr/local/lib/libcurl.so.4
> #14 0x00002aaabea8b055 in curl_mvsnprintf () from
> /usr/local/lib/libcurl.so.4
> #15 0x00002aaabea7678f in Curl_failf () from /usr/local/lib/libcurl.so.4
> #16 0x00002aaabea6d871 in Curl_resolv_timeout () from
> /usr/local/lib/libcurl.so.4
> #17 0x00000006e8a8f230 in ?? ()
>
> Fix code:
> esxVI_CURL_Connect() in esx_vi.c:
> I add a new line as following:
> curl_easy_setopt(curl->handle, CURLOPT_NOSIGNAL, 1);

It took me a moment reading libcurl code until I figured out what
might be happening here. The problem is that Curl_resolv_timeout uses
SIGALRM + sigsetjmp/siglongjmp to realize the timeout logic. This
implementation is not thread-safe as the SIGALRM might be executed on
a different thread than the original thread that started the call to
Curl_resolv_timeout. This in turn results in the call to
Curl_resolv_timeout being continued via siglongjmp (called from the
SIGALRM handler) on different thread. Setting CURLOPT_NOSIGNAL to 1
makes libcurl avoid the SIGALRM + sigsetjmp/siglongjmp implementation.
This solves the problem but with the cost of losing the timeout
capability.

In your case a DNS lookup took longer than libcurl was willing to wait
and a timeout aborted it. But the call to Curl_failf (as part of the
timeout error handling) was made on the wrong thread (I think) making
it segfault. IMHO there is no ideal solution here, because with
CURLOPT_NOSIGNAL set to 0 (the default) libcurl can realize DNS lookup
with timeout, but the error handling might occur on the wrong thread.
But with CURLOPT_NOSIGNAL set to 1 the segfault is avoided but libcurl
might get stuck in a DNS lookup.

Are you able to reproduce this problem and can you confirm that
setting CURLOPT_NOSIGNAL to 1 fixes it?

-- 
Matthias Bolte
http://photron.blogspot.com


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]