[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [libvirt-1.0.5]deadlock in child process after call function backtrace, any suggestions is appreciate!



On 07/04/2013 01:43 AM, Caizhifeng wrote:
> Hi ALL,
>
> In order to catch the calltrace of deadlock in libvirtd, I modified the function virMutexLock as follows:
>
> struct virMutex {
>     pthread_mutex_t lock;
>     void *trace[TRACE_SIZE];    //added for test
>     int ntrace;                         //added for test
> };
>
> void virMutexLock(virMutexPtr m)
> {
>     struct timespec ts;
>
>     if (0 == clock_gettime(CLOCK_REALTIME, &ts)) {
>         ts.tv_sec += LOCK_TIMEOUT;
>         if (pthread_mutex_timedlock(&m->lock, &ts) == ETIMEDOUT) {
>             if (m->ntrace > 0)
>                 virLogBacktrace(m->ntrace, m->trace);
>             pthread_mutex_lock(&m->lock);
>         }
>
>         m->ntrace = backtrace(m->trace, TRACE_SIZE);            //record the call trace information.
>     } else {
>         pthread_mutex_lock(&m->lock);
>     }
> }
>
> The original is :
> void virMutexLock(virMutexPtr m){
>     pthread_mutex_lock(&m->lock);
> }
>
> But, unfortunatly, sometimes, deadlock happened in child process after virFork,

The problem is that backtrace() is not "async signal safe". The section
"Async-signal-safe functions" of "man 7 signal" explains what this is in
the context of a signal handler, but the conditions in a child process
just after fork() are really the same - a lock was acquired in one
thread of the parent, then *while that lock is being held* a different
thread of the parent calls fork(), which duplicates all of the process'
memory (including the lock) then creates a new process. In the new
process, the lock comes into existence marked as being held, but there
is no thread to unlock it, so when the child attempts to acquire the
lock, it waits forever.

To prevent this scenario, functions that aren't async signal safe
shouldn't be called in child processes of multithreaded parents (not
until a subsequent exec() has replaced the code that is running with
something new, that is).

I suppose you could solve this by putting a wrapper around backtrace()
that first acquires a lock that's visible to libvirt's code, then be
sure to separately acquire that lock just before fork(), and release it
in both the parent and child just after fork(). As an example, look at
the way that the logging lock is acquired/released just before/after
fork() in virFork(). Since your lock is being used for something that is
a part of libvirt's virMutex*() stuff, you'll of course need to use
lower level primitives.

(BTW, the virLog mutex is being acquired/released for exactly the same
reason - so that logging functions which aren't acync-signal safe can be
reliably called in the child process.)

>  the father libvirtd process' pid is 2987, and the child libvirtd process id is 29509, which is forked in order to run a shell script.
>
> root cvk143:~# ps -ef | grep libvirtd
> root      2987     1 52 08:36 ?        00:40:38 /usr/sbin/libvirtd -d
> root     29509  2987  0 09:38 ?        00:00:00 /usr/sbin/libvirtd -d
> root cvk143:~#
>
> the child process's call trace is as follow:
> (gdb) bt
> #0  0x00007f4d5e7fd89c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f4d5e7f9080 in _L_lock_903 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2  0x00007f4d5e7f8f19 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
> #3  0x00007f4d5e5607db in dl_iterate_phdr () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x00007f4d5c3c88b6 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
> #5  0x00007f4d5c3c5d70 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
> #6  0x00007f4d5c3c6490 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
> #7  0x00007f4d5c3c6d3e in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
> #8  0x00007f4d5e53b1c8 in backtrace () from /lib/x86_64-linux-gnu/libc.so.6
> #9  0x00007f4d5f6f8c92 in virMutexLock (m=0x7f4d5fadedc0) at util/virthreadpthread.c:128
> #10 0x00007f4d5f6d0a3d in virLogLock () at util/virlog.c:152
> #11 0x00007f4d5f6d0f66 in virLogReset () at util/virlog.c:311
> #12 0x00007f4d5f6b3139 in virFork (pid=0x7f4d5a1a4310) at util/vircommand.c:281
> #13 0x00007f4d5f6b3b06 in virExec (cmd=0x7f4d2000a6f0) at util/vircommand.c:493
> #14 0x00007f4d5f6b8a34 in virCommandRunAsync (cmd=0x7f4d2000a6f0, pid=0x0) at util/vircommand.c:2340
> #15 0x00007f4d5f6b815c in virCommandRun (cmd=0x7f4d2000a6f0, exitstatus=0x7f4d5a1a4728) at util/vircommand.c:2191
> #16 0x00007f4d5f6b4bdd in virRun (argv=0x7f4d5a1a4730, status=0x7f4d5a1a4728) at util/vircommand.c:776
> #17 0x00007f4d60231006 in virStorageNFSPoolCheckSub (hostName=0x7f4d50011630 "192.168.0.6", hostDir=0x7f4d50014560 "/vms/isos")
>     at storage/storage_backend.c:165
> #18 0x00007f4d602312fa in virStoragePoolCheckFirst (pool=0x7f4d5000dcc0) at storage/storage_backend.c:255
> #19 0x00007f4d602383af in virStorageBackendFileSystemRefresh (conn=0x7f4d3c0010a0, pool=0x7f4d5000dcc0) at storage/storage_backend_fs.c:887
> #20 0x00007f4d6022c458 in storagePoolRefresh (obj=0x7f4d200012e0, flags=0) at storage/storage_driver.c:1705
> #21 0x00007f4d5f7ac445 in virStoragePoolRefresh (pool=0x7f4d200012e0, flags=0) at libvirt.c:12936
> #22 0x00007f4d6015c62f in remoteDispatchStoragePoolRefresh (server=0x7f4d60bb4ec0, client=0x7f4d60bbc380, msg=0x7f4d60bbcae0, rerr=0x7f4d5a1a4af0,
>     args=0x7f4d2000f0b0) at remote_dispatch.h:12867
> #23 0x00007f4d6015c527 in remoteDispatchStoragePoolRefreshHelper (server=0x7f4d60bb4ec0, client=0x7f4d60bbc380, msg=0x7f4d60bbcae0, rerr=0x7f4d5a1a4af0,
>     args=0x7f4d2000f0b0, ret=0x7f4d200144f0) at remote_dispatch.h:12845
> #24 0x00007f4d5f7fe7d5 in virNetServerProgramDispatchCall (prog=0x7f4d60bbfc90, server=0x7f4d60bb4ec0, client=0x7f4d60bbc380, msg=0x7f4d60bbcae0)
>     at rpc/virnetserverprogram.c:439
> #25 0x00007f4d5f7fe34e in virNetServerProgramDispatch (prog=0x7f4d60bbfc90, server=0x7f4d60bb4ec0, client=0x7f4d60bbc380, msg=0x7f4d60bbcae0)
>     at rpc/virnetserverprogram.c:305
> #26 0x00007f4d5f7f72ea in virNetServerProcessMsg (srv=0x7f4d60bb4ec0, client=0x7f4d60bbc380, prog=0x7f4d60bbfc90, msg=0x7f4d60bbcae0)
>     at rpc/virnetserver.c:162
> #27 0x00007f4d5f7f73cd in virNetServerHandleJob (jobOpaque=0x7f4d60bbdbf0, opaque=0x7f4d60bb4ec0) at rpc/virnetserver.c:183
> #28 0x00007f4d5f6f9602 in virThreadPoolWorker (opaque=0x7f4d60b9aa40) at util/virthreadpool.c:144
> #29 0x00007f4d5f6f8fd6 in virThreadHelper (data=0x7f4d60b9a9b0) at util/virthreadpthread.c:212
> #30 0x00007f4d5e7f6e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #31 0x00007f4d5e5244bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #32 0x0000000000000000 in ?? ()
> (gdb)
> ..............
>
> I googled and found the similar deadlock call trace,which can be found in this link:  code.google.com/p/gperftools/issues/detail?id=66  , but it is not a situation in libvirtd.
> ......
>                 Labels: -Priority-Medium Priority-Low
>                 Jan 22, 2013
>                 #32 DarkAge    gmail com
>
>                 Please note the glibc unwinder also uses dl_iterate_phdr, and takes several locks during backtrace generation in _Unwind_Find_FDE.
>
>                 #0  0x00007ffff7bc6e80 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
>                 #1  0x00007ffff7212fdb in dl_iterate_phdr () from /lib/x86_64-linux-gnu/libc.so.6
>                 #2  0x00007ffff6be28b6 in _Unwind_Find_FDE () from /lib/libgcc_s.so.1
>                 #3  0x00007ffff6bdfd70 in ?? () from /lib/libgcc_s.so.1
>                 #4  0x00007ffff6be0d7d in _Unwind_Backtrace () from /lib/libgcc_s.so.1
>                 #5  0x00007ffff71ed9c8 in backtrace () from /lib/x86_64-linux-gnu/libc.so.6
>                 #6  0x000000000040fdb5 in glibc_backtrace (error=<synthetic pointer>, buffer=0x7fffffffdd10,
>                     size=<optimized out>, ucontext=<optimized out>)
> ......
>
>
> Is there anyone ever encouter the similar problem? Would be great if someone can help me on this.
> Thank you very much.
>
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
> 邮件!
> This e-mail and its attachments contain confidential information from H3C, which is
> intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
>
> --
> libvir-list mailing list
> libvir-list redhat com
> https://www.redhat.com/mailman/listinfo/libvir-list


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]