[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[libvirt] domain restore race condition

As noted in another message, the problem I was seeing is a race condition in qemudDomainRestore(), not with my modifications to qemudDmainSave(). Here's some discussion about that problem from IRC, with a question at the bottom:

<laine> Does anyone else see a failure of domain restore (immediately after domain save? I'm very definitely seeing it on my machine with F12+updates testing and libvirt built from unpatched sources. <laine> It's very reproduceable - with virsh I do "save domain filename", then "restore filename" and it pretty much always gives me a black screen. Then I force shutdown the guest (with virt-manager) and do "restore filename" again. Tada! It's restored and running!
<danpb> laine: possible race condition
<danpb> laine: try putting a sleep(10) before the qemuMonitorStartCPUs in qemuDomainRestore()

Dan's suggestion *did* eliminate the failures.

<danpb> laine: this sounds like the issue with libvirt prematurely starting execution of the CPUs before QEMU has even started restoring (or soemthing like that) <danpb> laine: search the archives for a mail from Charles Duffy on this subject some time ago

Here's the BZ filed by Charles Duffy


It looks like he's dealing with a race condition earlier in the restore, since his solution was to wait for the migration process to terminate somewhere inside qemudStartVMDaemon(), rather than waiting until qemudStartVMDaemon() was finished (which is what it does now). Since this wait has already been done anyway by the time of Dan's sleep(10) in my test, I don't think Charles' patch would help this situation.

So is there something that libvirt can wait on here to ensure proper start? Or is there a problem in qemu? (I'm still running 0.11. I'll also try upgrading to 0.12 and see if there are changes in behavior.)

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]