[rhelv6-list] KVM issues post RHEL6-1->6.2 update

Tue Dec 20 12:30:37 UTC 2011

Hi,

I've just been bitten by the exact same problem on the only two RHEL6
servers running as KVM hosts I have. All guests got killed during the
host's package update, and impossible to start any of them again. Even
after rebooting the hosts :

[root at x qemu]# virsh start y
error: Failed to start domain y
error: internal error unable to reserve PCI address 0:0:2.0

All of these guests have been installed with virt-install and kickstart
and all are headless. Not a single manual tweak has been made to their
XML configuration.

The <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/> is the first NIC for them. Changing the "2" to "5" or
"6" (the next available number in their configuration) fixes the
problem.

But then, again the exact same problem! Kernel panic at startup with
the following message on a guest that was updating while the host
killed it :

VFS: Cannot open root device
"UUID=f7302f62-6df1-4670-9232-5614daf500a2" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available
partitions: Kernel panic - not syncing: VFS: Unable to mount root fs on
unknown-block(0,0) Pid: 1, comm: swapper Not tainted
2.6.32-220.el6.x86_64 #1
Call Trace:
 [<ffffffff814ec341>] ? panic+0x78/0x143
 [<ffffffff81c203ed>] ? mount_block_root+0x1ea/0x29e
 [<ffffffff81002930>] ? trace_kmalloc+0x260/0x930
 [<ffffffff81c204f7>] ? mount_root+0x56/0x5a
 [<ffffffff81c2066b>] ? prepare_namespace+0x170/0x1a9
 [<ffffffff81c1f911>] ? kernel_init+0x2e3/0x2f9
 [<ffffffff8100c14a>] ? child_rip+0xa/0x20
 [<ffffffff81c1f62e>] ? kernel_init+0x0/0x2f9
 [<ffffffff8100c140>] ? child_rip+0x0/0x20

Boot on the previous kernel, then a whole lot of yum/rpm fun to get the
server back to a decent state (90 duplicates...).

There is something very wrong with the newer KVM related package not
allowing the PCI address 0:0:2.0, when that address was set by
virt-install itself (i.e. not by some unsupported tool).

I'm starting to miss paravirtualization, where even when the host's
userland tools would stop working, the guests would typically keep
running...

Next fun thing : Find out how I managed to get the same guest running
twice (!!??), though the same "id" might indicate more of a libvirt
problem...

[root at x qemu]# virsh list
 Id Name                 State
----------------------------------
  1 y                    running
  1 y                    running
  2 z                    running
  3 a                    running
  4 b                    running
  5 c                    running

Matthias

On Thu, 8 Dec 2011 09:31:39 +0000 (GMT)
Ben <bda20 at cam.ac.uk> wrote:

> Just thought I'd share my experiences of updating a KVM host and
> guests this morning.  I'll acknowledge up front that I didn't do
> things in the right order so the mistakes were mine.
> 
> Start: RHEL6.1 KVM host, x2 RHEL6.1 guests using .img files (LVM
> partitions inside).  Fully up to date as of just before the RHEL6.2
> errata release.
> 
> I did "yum clean all ; yum update" on both the host and the guests at
> the same time (yeah, I know).  In my defence, a seemingly identical
> setup I did this on yesterday worked without issues.
> 
> At the point at which the host was completing its cleanup this
> happened in /var/log/messages:
> 
> Dec  8 07:14:47 frazil libvirtd: 07:14:47.926: 14778: warning :
> qemudDispatchSignalEvent:403 : Shutting down on signal 15 Dec  8
> 07:14:49 frazil yum[1235]: Updated: libvirt-0.9.4-23.el6_2.1.x86_64
> 
> and further down
> 
>   Dec  8 07:15:00 frazil kernel: br1: port 2(vnet1) entering disabled
> state Dec  8 07:15:00 frazil kernel: device vnet1 left promiscuous
> mode Dec  8 07:15:00 frazil kernel: br1: port 2(vnet1) entering
> disabled state Dec  8 07:15:02 frazil ntpd[2194]: Deleting interface
> #23 vnet1, fe80::fc54:ff:fe01:6b3b#123, interface stats: received=0,
> sent=0, dropped=0, active_time=7241352 secs Dec  8 07:15:05 frazil
> kernel: br0: port 2(vnet0) entering disabled state Dec  8 07:15:05
> frazil kernel: device vnet0 left promiscuous mode Dec  8 07:15:05
> frazil kernel: br0: port 2(vnet0) entering disabled state Dec  8
> 07:15:07 frazil ntpd[2194]: Deleting interface #25 vnet0,
> fe80::fc54:ff:fe49:fae6#123, interface stats: received=0, sent=0,
> dropped=0, active_time=7238050 secs
> 
> At this point I lost connection to the guests, which (according to
> the SSH connections I had open to them) had apparently finished
> cleaning up after the yum update (according to the right-hand side
> X/Y counter) but hadn't returned a prompt yet so were obviously still
> busy doing stuff.
> 
> I guess the restart of the libvirtd service dropped the guests
> (except the same lines appear in the messages file of the server on
> which the guests didn't get killed).
> 
> Given I was rebooting the host anyway I didn't bother to bring the
> guests back up again and rebooted the host (yeah, I know).  On reboot
> neither of the guests autostarted, so I logged in to the host and
> tried to start them with "virsh start <domain>".  Both complained that
> 
>   error: internal error unable to reserve PCI address 0:0:2.0
> 
> and didn't start.  Checking the .xml files for both guests I noted
> that
> 
>   <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
> function='0x0'/>
> 
> was listed for the 'disk' device.  I also noticed that the following
> lines were missing
> 
>   <input type='mouse' bus='ps2'/>
>   <graphics type='vnc' port='5901' autoport='no'/>
>   <video>
>     <model type='cirrus' vram='9216' heads='1'/>
>     <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
> function='0x0'/> </video>
> 
> whereas they were in place for the KVM setup/host and guests which
> had successfully updated.  I added in the lines, made the 'disk' PCI
> ID something else and after restarting libvirtd tried booting the
> guests again. Still no joy.  Still the same error.  In the end I
> commented out the "address type='pci'" line for 'video' and attempted
> to boot again.  This time I got failures booting the newly installed
> kernel at the point at which the root LVM mount was attempted.  It
> recommended I look at the "root=" part of the boot line, but didn't
> give me suggestions as to what to put there.
> 
> At this point I tried mounting the guests' disk images to see if the
> update of the kernel hadn't worked fully and the grub.conf was in a
> mess:
> 
>   # losetup /dev/loop0 foo.img
>   # kpartx -av /dev/loop0
>   # mount /dev/mapper/loop0p1 /mnt
>   ...
>   # umount /mnt
>   # kpartx -dv /dev/loop0
>   # losetup -d /dev/loop0
> 
> Once inside the image I looked at the grub.conf files and couldn't
> see any issues.  I umounted the image and tried booting into an older
> kernel and the guests booted successfully.  "yum update" indicated an
> incomplete transaction so I ran "yum-complete-transaction" and then
> "yum update kernel" and rebooted both guests successfully into the
> new kernel.  All now seems well.  Phew.
> 
> My questions are:
> 
> 1) Is it a bad idea to patch the host's libvirtd while guests are
> running? 2) Should libvirtd have killed the guests like that?
> 3) With this update to KVM/qemu/libvird are "address type='pci'" now 
> unnecessary and removable from /etc/libvirt/qemu/<domain>.xml files
> as PCI IDs are now dynamically assigned?
> 
> Ben