[libvirt] [frankschreuder at outlook.com: Fwd: Reproducible live disk detach bug]

Frank Schreuder fschreuder at transip.nl
Sat Aug 8 11:16:33 UTC 2015



Op 8/7/2015 om 10:18 AM schreef Frank Schreuder:
>
> Op 8/7/2015 om 12:24 AM schreef Laine Stump:
>>>> Subject: Reproducible live disk detach bug
>>>>
>>>> Hey guys,
>>>>
>>>> I am currently running into a reproducible libvirt bug when I detach a
>>>> disk from a running VM using libvirt 1.2.18 and qemu 2.3.0.
>>>>
>>>> I can attach a disk to a running VM without any problems.
>>>> During the attach operation I don't create a disk alias name, which
>>>> results in libvirt generating an unique alias.
>>>> Using virsh dumpxml I get the following output(snippet):
>>>>
>>>> <disk type='file' device='disk'>
>>>>        <driver name='qemu' type='raw' cache='none'/>
>>>>        <source file='/var/local/mnt/test-storage/test-storage.raw'/>
>>>>        <backingStore/>
>>>>        <target dev='vde' bus='virtio'/>
>>>>        <iotune>
>>>> <read_bytes_sec>94371840</read_bytes_sec>
>>>> <write_bytes_sec>31457280</write_bytes_sec>
>>>>          <read_iops_sec>1000</read_iops_sec>
>>>>          <write_iops_sec>200</write_iops_sec>
>>>>        </iotune>
>>>>        <alias name='virtio-disk4'/>
>>>>        <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
>>>> function='0x0'/>
>>>> </disk>
>>>>
>>>> As you can see, libvirt generates 'virtio-disk4' as an alias. But as
>>>> soon as I try to detach the disk from the running VM, I will 
>>>> receive the
>>>> following two errors in my libvirt log:
>>>>
>>>> 2015-08-06 10:48:59.943+0000: 11657: warning :
>>>> qemuMonitorJSONHandleDeviceDeleted:939 : missing device in device
>>>> deleted event
>>>> 2015-08-06 10:48:59.945+0000: 11662: error :
>>>> qemuMonitorTextDriveDel:2594 : operation failed: deleting
>>>> drive-virtio-disk4 drive failed: 2015-08-06T10:48:59.945058Z Device
>>>> 'drive-virtio-disk4' not found
>>>>
>>>> Apparently libvirt tries to detach the disk with a drive- prefix, 
>>>> which
>>>> doesn't exist?
>> I can't tell you right off why the failure message is there, but the
>> alias with the added "drive-" is easily explained.
>>
>> Each <disk> element in the libvirt config corresponds to two entities in
>> the qemu virtual machine - a "device" component (what is seen by the
>> guest) and a "drive" component (what resources are used on the host to
>> back up that device). Each of these corresponds to a separate qemu
>> commandline argument (named "-device" and "-drive"), and each of those
>> must have a different, unique "id" option. For example, this is what is
>> added to my qemu commandline when I add a virtio disk named
>> "rawhide.qcow2" to my virtual machine:
>>
>> -drive
>> file=/var/lib/libvirt/images/rawhide.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 
>>
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 
>>
>>
>>
>> If you look at the status XML of the domain while it is running, you'll
>> see that the <alias> for the disk element is "virtio-disk0", and that's
>> what is given to the -device half of the disk as its qemu id. The -drive
>> half uses "drive-$alias" as its qemu id though (you'll notice that the
>> -device half references that id with its "drive" option - that's how the
>> two halves are tied together by qemu).
>>
>> So, there is nothing unusual about qemu wanting to delete something
>> called drive-virtio-disk4 when you have a <disk> with
>> alias='virtio-disk4'. It obviously isn't right that an error message is
>> being logged. Something sounds familiar about the "missing device in
>> device deleted event" message, but I can't recall details. Does anyone
>> else have a better memory? Frank - would it be possible for you to try
>> this same sequence of events with the same libvirt and an older qemu
>> (and/or vice-versa)?
>>
>
> I have some more interesting debug information.
>
> I'm -not- able to reproduce it on libvirt 1.2.11 - qemu 2.3.0 and 
> libvirt 1.2.11 - qemu 1.7.1 using a FreeBSD VM.
> I'm able to reproduce the errors with libvirt 1.2.11 - qemu 2.3.0 and 
> libvirt 1.2.18 - qemu 2.3.0 using a Debian VM.
> Seems like the error triggers depending on the guest OS?
>
> During hot-unplugging of network interfaces libvirt also triggers the 
> same error on a Debian VM:
>
> 2015-08-06 10:48:59.945+0000: 11662: error : 
> qemuMonitorJSONHandleDeviceDeleted:939 : missing device in device 
> deleted event
>
>
> I gathered some logging information with info level logging enabled 
> while detaching a disk:
> http://pastebin.com/raw.php?i=vVLuavG2
>
> Debug level logging output was too big for a pastebin, contact me if 
> you are interested in the debug level output logging.
>
> Hope this will help solving this issue, if there is more information I 
> can provide or tests I can execute, please let me know.
>
>

I may found something:

src/qemu/qemu_monitor_text.c
/* (qemu) drive_del wark
       * Device 'wark' not found */
     } else if (STRPREFIX(reply, "Device '") && (strstr(reply, "not 
found"))) {
          /* NB: device not found errors mean the drive was auto-deleted 
and we
           * ignore the error */

This piece of code should ignore the following error:
2015-08-06 21:14:56.847+0000: 8039: error : qemuMonitorTextDriveDel:2594 
: operation failed: deleting drive-virtio-disk4 drive failed: 
2015-08-06T21:14:56.847359Z Device 'drive-virtio-disk4' not found

When I look at the error message in the logs again, it seems like (newer 
versions of) qemu or libvirt prefixes the command output with a time 
stamp (2015-08-06T21:14:56.847359Z).
This means the string no longer starts with "Device '".

I made a patch to make this check less strict (untested).

diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c
index 2aa0460..1052450 100644
--- a/src/qemu/qemu_monitor_text.c
+++ b/src/qemu/qemu_monitor_text.c
@@ -2586,7 +2586,7 @@ int qemuMonitorTextDriveDel(qemuMonitorPtr mon,

      /* (qemu) drive_del wark
       * Device 'wark' not found */
-    } else if (STRPREFIX(reply, "Device '") && (strstr(reply, "not 
found"))) {
+    } else if (strstr(reply, "Device '") && (strstr(reply, "not found"))) {
          /* NB: device not found errors mean the drive was auto-deleted 
and we
           * ignore the error */
      } else if (STRNEQ(reply, "")) {

I hope somebody with more knowledge about libvirt can confirm this.

Thanks,
Frank




More information about the libvir-list mailing list