hdd kills vm

Wed Oct 25 10:30:34 UTC 2023

On Tue, Oct 24, 2023 at 04:28:58PM +0200, Martin Kletzander wrote:
>On Mon, Oct 23, 2023 at 04:59:08PM +0200, daggs wrote:
>>Greetings Martin,
>>
>>> Sent: Sunday, October 22, 2023 at 12:37 PM
>>> From: "Martin Kletzander" <mkletzan at redhat.com>
>>> To: "daggs" <daggs at gmx.com>
>>> Cc: libvir-list at redhat.com
>>> Subject: Re: hdd kills vm
>>>
>>> On Fri, Oct 20, 2023 at 02:42:38PM +0200, daggs wrote:
>>> >Greetings,
>>> >
>>> >I have a windows 11 vm running on my Gentoo using libvirt (9.8.0) + qemu (8.1.2), I'm passing almost all available resources to the vm
>>> >(all 16 cpus, 31 out of 32 GB, nVidia gpu is pt), but the performance is not good, system lags, takes long time to boot.
>>>
>>> There are couple of things that stand out to me in your setup and I'll
>>> assume the host has one NUMA node with 8 cores, each with 2 threads as,
>>> just like you set it up in the guest XML.
>>thats correct, see:
>>$ lscpu | grep -i numa
>>NUMA node(s):                       1
>>NUMA node0 CPU(s):                  0-15
>>
>>however:
>>$ dmesg | grep -i numa
>>[    0.003783] No NUMA configuration found
>>
>>can that be the reason?
>>
>
>no, this is fine, 1 NUMA node is not a NUMA, technically, so that's
>perfectly fine.
>
>>>
>>> * When you give the guest all the CPUs the host has there is nothing
>>>    left to run the host tasks.  You might think that there "isn't
>>>    anything running", but there is, if only your init system, the kernel
>>>    and the QEMU which is emulating the guest.  This is definitely one of
>>>    the bottlenecks.
>>I've tried with 12 out of 16, same behavior.
>>
>>>
>>> * The pinning of vCPUs to CPUs is half-suspicious.  If you are trying to
>>>    make vCPU 0 and 1 be threads on the same core and on the host the
>>>    threads are represented as CPUs 0 and 8, then that's fine.  If that is
>>>    just copy-pasted from somewhere, then it might not reflect the current
>>>    situation and can be source of many scheduling issues (even once the
>>>    above is dealt with).
>>I found a site that does it for you, if it is wrong, can you point me to a place I can read about it?
>>
>
>Just check what the topology is on the host and try to match it with the
>guest one.  If in doubt, then try it without the pinning.
>
>>>
>>> * I also seem to recall that Windows had some issues with systems that
>>>    have too many cores.  I'm not sure whether that was an issue with an
>>>    edition difference or just with some older versions, or if it just did
>>>    not show up in the task manager, but there was something that was
>>>    fixed by using either more sockets or cores in the topology.  This is
>>>    probably not the issue for you though.
>>>
>>> >after trying a few ways to fix it, I've concluded that the issue might be related to the why the hdd is defined at the vm level.
>>> >here is the xml: https://bpa.st/MYTA
>>> >I assume that the hdd sits on the sata ctrl causing the issue but I'm not sure what is the proper way to fix it, any ideas?
>>> >
>>>
>>> It looks like your disk is on SATA, but I don't see why that would be an
>>> issue. Passing the block device to QEMU as VirtIO shouldn't cause that
>>> much of a difference.  Try measuring the speed of the disk on the host
>>> and then in the VM maybe.  Is that SSD or NVMe?  I presume that's not
>>> spinning rust, is it.
>>as seen, I have 3 drives, 2 cdroms as sata and one hdd pt as virtio, I read somewhere that if the controller of the virtio
>>device is sata, than it doesn't uses the virtio optimally.
>
>Well it _might_ be slightly more beneficial to use virtio-scsi or even
><disk type='block' device='lun'>, but I can't imagine that would make
>the system lag.  I'm not that familiar with the details.
>
>>it is a spindle, nvmes are too expensive where I live, frankly, I don't need lightning fast boot, the other BM machines running windows on spindle
>>run it quite fast and they aren't half as fast as this server
>>
>
>That might actually be related.  The guest might think it is a different
>type of disk and use completely suboptimal scheduling.  This might
>actually be solved by passing it as <disk device='lun'..., but at this
>point I'm just guessing.
>

Also you probably want to use something like:

<target dev='sda' bus='scsi' rotation_rate='X'/>

and I have no idea whether matching the rotation_rate to the actual one
is beneficial, maybe skip that?

>>>
>>> >Thanks,
>>> >
>>> >Dagg.
>>> >
>>>
>>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20231025/427bc6db/attachment.sig>