[libvirt] [PATCH] create a thread to handle MigrationParamReset to avoid deadlock

Daniel P. Berrangé berrange at redhat.com
Mon Dec 23 16:21:30 UTC 2019


On Mon, Dec 23, 2019 at 04:50:00PM +0100, Michal Prívozník wrote:
> On 12/23/19 11:12 AM, Daniel P. Berrangé wrote:
> > On Mon, Dec 23, 2019 at 03:13:10PM +0800, Yi Wang wrote:
> >> From: Li XueLei <li.xuelei1 at zte.com.cn>
> >>
> >> Libvirtd no longer receives external requests, after I do the following.
> >>
> >> Steps to reproduce:
> >> 1.Virsh tool initiates a non-p2p migrations.
> >> 2.After the phase of qemuDomainMigratePerform3 and before the phase of
> >>   qemuDomainMigrateConfirm3, kill the virsh client which initiated the
> >>   migration.
> >>
> >> What happens:
> >>     Libvirtd will be blocked in the next call stack because it can't
> >> get the condition variables, and it will not be able to receive new
> >> requests.
> >>
> >> Thread 1 (Thread 0x7f36a7b9f8c0 (LWP 27556)):
> >> #0  0x00007f36a45799f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> >> #1  0x00007f36a6e7a97e in virCondWait () from /lib64/libvirt.so.0
> >> #2  0x00007f3679c6a1b3 in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #3  0x00007f3679c837de in qemuMonitorJSONCommandWithFd () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #4  0x00007f3679c93c5a in qemuMonitorJSONSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #5  0x00007f3679c786ed in qemuMonitorSetMigrationCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #6  0x00007f3679c67857 in qemuMigrationParamsApply () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #7  0x00007f3679c67eff in qemuMigrationParamsReset () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #8  0x00007f3679c5198a in qemuMigrationSrcCleanup () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #9  0x00007f36a6dcd862 in virCloseCallbacksRun () from /lib64/libvirt.so.0
> >> #10 0x00007f3679cc28f6 in qemuConnectClose () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> >> #11 0x00007f36a70a0870 in virConnectDispose () from /lib64/libvirt.so.0
> >> #12 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
> >> #13 0x00007f36a70a55c4 in virConnectClose () from /lib64/libvirt.so.0
> >> #14 0x000056058a206f92 in remoteClientFree ()
> >> #15 0x00007f36a6fa49a0 in virNetServerClientDispose () from /lib64/libvirt.so.0
> >> #16 0x00007f36a6e3f43b in virObjectUnref () from /lib64/libvirt.so.0
> >> #17 0x00007f36a6e3fd21 in virObjectFreeCallback () from /lib64/libvirt.so.0
> >> #18 0x00007f36a6f9942e in virNetSocketEventFree () from /lib64/libvirt.so.0
> >> #19 0x00007f36a6de8439 in virEventPollCleanupHandles () from /lib64/libvirt.so.0
> >> #20 0x00007f36a6de9ab6 in virEventPollRunOnce () from /lib64/libvirt.so.0
> >> #21 0x00007f36a6de817a in virEventRunDefaultImpl () from /lib64/libvirt.so.0
> >> #22 0x00007f36a6faadb5 in virNetDaemonRun () from /lib64/libvirt.so.0
> >> #23 0x000056058a1c5cc1 in main ()
> > 
> > Hmm, this is a bit strange - why are we trying to reset migration parameters
> > in a connection close callback in the first place ?
> 
> Because we are cancelling the ongoing migration because the connection
> that initiated it was closed.
> 
> > 
> > Libvirt allows multiple connections concurrently and changes made by one
> > connection are supposed to apply globally. No per-VM state should be tied
> > to a particular connection.
> 
> This is generally very true, except for migration.

Hmm, so in that case we do need to make sure this runs from a non-main
event thread as this patch does. I'm surprised we've not seen this
before though - i'd think it should be a guaranteed deadlock anytime
someone tests this scenario.

> 
> > 
> > IOW, I don't think we need a thread. We should just stop calling
> > qemuMigrationSrcCleanup from the connection close callback entirely
> 
> But I agree that something fishy is going on and this doesn't look like
> proper solution. Yi, can you please share the backtrace of other threads
> too? Why aren't we getting any reply on the monitor?

qemuMonitorSend() just puts date onto the TX queue & waits for the
main event loop to send it.  We're invoking qemuMonitorSend from
the main event loop in this backtrace, hence it'll wait forever
for the thread to send it.

qemuMonitorSend must never be invoked from the main event thread.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list