[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] cont command failing via JSON monitor on restore

Daniel P. Berrange wrote:
> On Wed, Jan 12, 2011 at 11:29:27PM -0500, Laine Stump wrote:
>> On 01/12/2011 05:13 PM, Jim Fehlig wrote:
>>> libvirt 0.8.7
>>> qemu 0.13
>>> I'm looking into a problem with qemu save/restore via JSON monitor.  On
>>> restore, the vm is left in a paused state with following error returned
>>> for 'cont' command
>>> An incoming migration is expected before this command can be executed
>>> I was trying to debug the issue in gdb, but stepping through the code
>>> introduces enough delay between qemudStartVMDaemon() and doStartCPUs()
>>> that the latter succeeds.  Any suggestions on how to determine when it
>>> is safe to call doStartCPUs() to prevent the above error?  I don't see
>>> this issue with the text monitor btw.
>> I'm pretty sure this is related to a bug I reported on qemu-devel
>> last April:
>>    http://lists.gnu.org/archive/html/qemu-devel/2010-04/msg00635.html
>> (be sure to read my own followup if you want a correct description
>> of the circumstances). In this case libvirt was using the text
>> monitor, and there was a race condition between qemudStartVMDaemon
>> (which executes qemu with '-S -incoming') and doStartCPUs() (which
>> issues a 'cont' command to the qemu monitor). The result would be
>> that sometimes the 'cont' would be received and processed by qemu
>> before the incoming migration had started, meaning that qemu would
>> be executing garbage memory instead of the saved/restored image of
>> the guest.
>> The solution to this was posted to upstream qemu in July:
>>   http://lists.gnu.org/archive/html/qemu-devel/2010-07/msg01574.html
>> and I believe is in qemu 0.13. That patch adds a check to the 'cont'
>> command so that if '-incoming' was specified on the commandline,
>> 'cont' will only execute after a migration has successfully
>> completed, but will otherwise return an error.
>> Actually, thinking about this "fix", it seems that it isn't really a
>> solution, because instead of the guest starting up in an
>> indeterminate state, doStartCPUs() will just fail (as you've seen)
>> making the entire guest startup fail.
> Yep, it wasn't really intended as a fix. It was intended to make
> the error scenario clearly detectable, which has succeeded as per
> Jim's report. The fact that QMP returned an error in this way,
> means we can now reliably detect, usleep(1000), and then retry
> the 'cont' again at which point we should succeed.

Something like the attached patch?  I'm not quite sure about the retry
bounds (currently 1 second).  In my testing, it always succeeds on the
second attempt - even with large memory vms.


>From ec9109b40a4b2c45035495e0e4f65824a92dcf3d Mon Sep 17 00:00:00 2001
From: Jim Fehlig <jfehlig novell com>
Date: Thu, 13 Jan 2011 12:52:23 -0700
Subject: [PATCH] qemu: Retry JSON monitor cont cmd on MigrationExpected error

When restoring a saved qemu instance via JSON monitor, the vm is
left in a paused state.  Turns out the 'cont' cmd was failing with
"MigrationExpected" error class and "An incoming migration is
expected before this command can be executed" error description
due to migration (restore) not yet complete.

Detect if 'cont' cmd fails with "MigrationExpecte" error class and
retry 'cont' cmd.
 src/qemu/qemu_monitor_json.c |   20 +++++++++++++++++---
 1 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c
index 7877731..63b736e 100644
--- a/src/qemu/qemu_monitor_json.c
+++ b/src/qemu/qemu_monitor_json.c
@@ -702,13 +702,27 @@ qemuMonitorJSONStartCPUs(qemuMonitorPtr mon,
     int ret;
     virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("cont", NULL);
     virJSONValuePtr reply = NULL;
+    int i = 0, timeout = 3;
     if (!cmd)
         return -1;
-    ret = qemuMonitorJSONCommand(mon, cmd, &reply);
+    do {
+        ret = qemuMonitorJSONCommand(mon, cmd, &reply);
-    if (ret == 0)
-        ret = qemuMonitorJSONCheckError(cmd, reply);
+        if (ret != 0)
+            break;
+        /* If no error, we're done */
+        if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0)
+            break;
+        /* If error class is not MigrationExpected, we're done.
+         * Otherwise try 'cont' cmd again */
+        if (!qemuMonitorJSONHasError(reply, "MigrationExpected"))
+            break;
+        virJSONValueFree(reply);
+    } while ((++i <= timeout) && (usleep(250000) <=0));

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]