[vfio-users] 1 GB hugepages cause host crash on guest shutdown with some GPUs

Hristo Iliev hristo at hiliev.eu
Tue Dec 8 21:50:01 UTC 2015


On Tue, 8 Dec 2015 15:32:18 +0700 Okky Hendriansyah <okky at nostratech.com> wrote:

> On December 8, 2015 at 15:02:08, Hristo Iliev (hristo at hiliev.eu) wrote:
> 
>> ...
>>
>> When I shut down the host completely and then boot it with linux-lts, the 
>> VM boots up fine. I can reboot or shut down the VM and then boot it up
>> again multiple times without any issues. But should I reboot the host
>> without shutting it down, the VM would reliably hang on the OVMF splash.
>> It's really perplexing. Something I haven't tried yet though is to
>> disable the huge pages guest memory backing and see if it has any effect. 
>> 
>> Perhaps, I should simply replace the reboot action of my display manager 
>> with shutdown and declare it a solution :) 
>
> Hmm, probably the passed through GPU didn’t properly reinitialize/ejected
> during forced host reboot, since the handle still on the guest. I’m not
> really sure about this. Why do you reboot the host without shutting down
> the guest? Probably you need to configure something like pre-reboot hook
> script to properly shutdown all guests first.
> 

Sorry, I wasn't clear enough. I always shut down the VM before rebooting
the host. In pseudo log format, my experience looks like this:

shut down host
boot host with linux-lts
  boot VM -> ok
    shut down VM
  boot VM -> ok
    shut down VM
reboot host with linux-lts
  boot VM -> OVMF hang
    force off
  boot VM -> OVMF hang
    force off
reboot host with linux-lts
  boot VM -> OVMF hang
    force VM off
shut down host
boot host with linux-lts
  boot VM -> ok
    shut down VM

With linux-vfio-lts I can reboot the host at any time and the VM always
boots ok afterwards, even if the host was running linux-lts and the VM was
hanging before the reboot. So, a more complete version of the above scenario
looks like this:

host is off
boot host with linux-lts
  boot VM -> ok
    shut down VM
  boot VM -> ok
    shut down VM
reboot host with linux-lts
  boot VM -> OVMF hang
    force off
  boot VM -> OVMF hang
    force off
reboot host with linux-vfio-lts
  boot VM -> ok
    shut down VM
reboot host with linux-vfio-lts
  boot VM -> ok
    shut down VM
reboot host with linux-lts
  boot VM -> OVMF hang
    force off
reboot host with linux-vfio-lts
  boot VM -> ok
    shut down VM
shut down host
boot host with linux-lts
  boot VM -> ok
    shut down VM

I just performed a series of experiments and the latter scenario repeats
100% reliably. Both linux-vfio-lts and linux-lts are based on the 4.1.13
kernel. The VM is 64-bit Windows 10.

I've noticed that whenever the VM hangs, qemu spits seven times the
following:

ehci: PERIODIC list base register set while periodic schedule
      is enabled and HC is enabled

This seems to be related to the EHCI controller not being initialised
properly. There is an old (ca. 2011) bug report about Windows guests hanging
when USB devices are passed through. In my case, it seems to be more of a
consequence than the actual cause. I'm not passing through any USB devices
(using Synergy instead) and changing the USB controller type from 2.0 to
3.0 prevents the messages but not the OVMF hang.

> 
> ...
> 
> When yaourt asked you the first time, just answer that you want to
> compile all of them. I think by default it only compiles one component,
> hence you have to compile like 3 times although you already done that in
> the first step.
> 

That's exactly what I'm doing - answering positively to the question
whether it should build the three packages at once. Despite that, it still
wants to build the headers package again afterwards, regardless of the
fact that linux-vfio-lts-headers was created and installed during the
previous step.

> 
> -- 
> Okky Hendriansyah





More information about the vfio-users mailing list