[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] PATCH: Disable QEMU drive caching

Daniel P. Berrange wrote:
QEMU defaults to allowing the host OS to cache all disk I/O. THis has a
couple of problems

Oh, say it ain't so.  This is precisely what I didn't want to see happen :-(

 - It is a waste of memory because the guest already caches I/O ops

Page cache memory is easily reclaimable and has relatively low priority. If a guest needs memory, the size of the page cache will be reduced.

 - It is unsafe on host OS crash - all unflushed guest I/O will be
   lost, and there's no ordering guarentees, so metadata updates could
   be flushe to disk, while the journal updates were not. Say goodbye
   to your filesystem.

This has nothing to do with cache=off. The IDE device defaults to write-back caching. As such, IDE makes no guarantee that when a data write completes, it's actually completed on disk. This only comes into play when write-back is disabled. I'm perfectly happy to accept a patch that adds explicit sync's when write-back is disabled.

For SCSI, an unordered queue is advertised. Again, everything depends on whether or not write-back caching is enabled or not. Again, perfectly happy to take patches here.

More importantly, the most common journaled filesystem, ext3, does not enable write barriers by default (even for journal updates). This is how it ship in Red Hat distros. So there is no greater risk of corrupting a journal in QEMU than there is on bare metal.

 - It makes benchmarking more or less impossible / worthless because
   what the benchmark things are disk writes just sit around in memory
   so guest disk performance appears to exceed host diskperformance.

It just means you have to understand the extra level of caching.

A great deal of virtualization users are doing some form of homogeneous consolidation. If they have a good set of management tools or sophisticated storage, then their guests will be sharing base images or something like that. Caching in the host will result in major performance improvements because otherwise, the same data will be fetched multiple times.

This patch disables caching on all QEMU guests. NB, Xen has long done this
for both PV & HVM guests

They don't for HVM actually. When using file: for PV disks, it also goes through the host page cache. For HVM, Xen uses the write-back disabled synchronization stuff I mentioned early.

This is a really bad thing to do by default. I don't even think it should be an option for users because it's so terribly misunderstood.


Anthony Liguori

 - QEMU only gained this ability when -drive was
introduced, and sadly kept the default to unsafe cache=on settings.


diff -r 4a0ccc9dc530 src/qemu_conf.c
--- a/src/qemu_conf.c	Wed Oct 08 11:53:45 2008 +0100
+++ b/src/qemu_conf.c	Wed Oct 08 11:59:33 2008 +0100
@@ -460,6 +460,8 @@
         flags |= QEMUD_CMD_FLAG_DRIVE;
     if (strstr(help, "boot=on"))
         flags |= QEMUD_CMD_FLAG_DRIVE_BOOT;
+    if (strstr(help, "cache=on"))
+        flags |= QEMUD_CMD_FLAG_DRIVE_CACHE;
     if (version >= 9000)
         flags |= QEMUD_CMD_FLAG_VNC_COLON;
@@ -959,13 +961,15 @@
- snprintf(opt, PATH_MAX, "file=%s,if=%s,%sindex=%d%s",
+            snprintf(opt, PATH_MAX, "file=%s,if=%s,%sindex=%d%s%s",
                      disk->src ? disk->src : "", bus,
                      media ? media : "",
                      bootable &&
                      disk->device == VIR_DOMAIN_DISK_DEVICE_DISK
-                     ? ",boot=on" : "");
+                     ? ",boot=on" : "",
+                     qemuCmdFlags & QEMUD_CMD_FLAG_DRIVE_BOOT
+                     ? ",cache=off" : "");
diff -r 4a0ccc9dc530 src/qemu_conf.h
--- a/src/qemu_conf.h	Wed Oct 08 11:53:45 2008 +0100
+++ b/src/qemu_conf.h	Wed Oct 08 11:59:33 2008 +0100
@@ -44,7 +44,8 @@
     QEMUD_CMD_FLAG_NO_REBOOT      = (1 << 2),
     QEMUD_CMD_FLAG_DRIVE          = (1 << 3),
     QEMUD_CMD_FLAG_DRIVE_BOOT     = (1 << 4),
-    QEMUD_CMD_FLAG_NAME           = (1 << 5),
+    QEMUD_CMD_FLAG_DRIVE_CACHE    = (1 << 5),
+    QEMUD_CMD_FLAG_NAME           = (1 << 6),
/* Main driver state */

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]