[libvirt] qemu: lifecycle: reboot + shutdown, unexpected vm status.

zhang bo oscar.zhangbo at huawei.com
Thu Apr 16 06:02:37 UTC 2015


Steps:
    1 virsh reboot guest1 --mode=acpi
    2 virsh shutdown guest1 --mode=agent


Expected result:
    As the SHUTDOWN job is after REBOOT, we expected the guest to be *shutoff*. (Do you think so?)


Exacted result:
    After the 2 steps above, the guest got *rebooted*.


The reason to this problem:
    1 in qemuDomainReboot(mode acpi), it sets priv->fakeReboot to 1.
    2 after shutdown/reboot, qemu monitor IO trigged qemuProcessHandleShutdown(), which finds that priv->fakeReboot is 1, and reboot the guest.


Root Cause of the problem:
    After further look into  the problem, We found that the design of acpi/agent shutdown/reboot seems a little chaotic.
    -----------------------------------
    sheet1 who sets fakeReboot
    -----------------------------------
             shutdown           reboot
    acpi        Y(0)              Y(1)
    agent       N                 N
    It's apparently, *acpi-mode* jobs set fakeReboot.
    -----------------------------------
    sheet2 who needs to check fakeReboot(qemuProcessHandleShutdown())
    -----------------------------------
             shutdown           reboot
    acpi        Y                 Y
    agent       *Y*                 *N*
    Things become a little odd here. only agent-mode reboot doesn't check fakeReboot.

    We can tell from the above 2 sheets, that they're not consistent.
    *Agent-mode shutdown needs to check fakeReboot(trigger by SHUTDOWN monitorIO event), while it didn't set it before.*

    The chaos is not caused by libvirtd, it's a systematic problem, including guest os and qemu. (guest os writes ACPI, triggers qemu, qemu then triggers libvirtd)


My Solution:
    A simple solution is to make the 1st sheet consistent to the 2nd sheet, which is:
    -----------------------------------
    sheet3 who should set fakeReboot:
    -----------------------------------
             shutdown           reboot
    acpi        Y(0)              Y(1)
    agent       *Y(0)*               N
    -----------------------------------
    we let agent-mode shutdown set fakeReboot to 0.
--------------------------------------------------------------------------------
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 7eb5a7d..c751dcf 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -1959,6 +1959,8 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags)
         goto endjob;
     }

+    qemuDomainSetFakeReboot(driver, vm, isReboot);
+
     if (useAgent) {
         qemuDomainObjEnterAgent(vm);
         ret = qemuAgentShutdown(priv->agent, agentFlag);
@@ -1970,7 +1972,6 @@ static int qemuDomainShutdownFlags(virDomainPtr dom, unsigned int flags)
      */
     if (!useAgent ||
         (ret < 0 && (acpiRequested || !flags))) {
-        qemuDomainSetFakeReboot(driver, vm, isReboot);

         /* Even if agent failed, we have to check if guest went away
          * by itself while our locks were down.  */
----------------------------------------------------------------------------------


Discussion:
    Although the solution above seems to have solved the problem, but it seems a little awkward to have sheets like this.
    Maybe we should let sheet2 consistent with sheet1, rather than letting sheet1 consistent to sheet2.  But It's too difficult to realize that, seems impossible
    So, is there a better way to solve this problem? or, shall I commit this patch?





More information about the libvir-list mailing list