[libvirt] [PATCH 2/4] conf: Move hugepage XML validation check out of qemu_command
Michal Privoznik
mprivozn at redhat.com
Fri Jul 13 14:57:22 UTC 2018
On 07/13/2018 02:02 PM, Pavel Hrdina wrote:
> On Wed, Jul 11, 2018 at 06:03:08PM +0200, Pavel Hrdina wrote:
>> To make it clear I'll summarize all the possible combinations and how it
>> should work so we are on the same page.
>
> originally: before commit [1]
> now: after commit [1] (current master)
> expect: what this patch series should fix
>
>
> ======= Non-NUMA guests =======
>
>
> * Only one hugepage specified without any nodeset
>
> <memoryBacking>
> <hugepages>
> <page size='1048576' unit='KiB'/>
> </hugepages>
> </memoryBacking>
>
> This is correct, was always working and we should not change it.
>
> originally: works
> now: works
> expected: works
>
>
> * Only one hugapage specified with nodeset
>
> <memoryBacking>
> <hugepages>
> <page size='1048576' unit='KiB' nodeset='0'/>
> </hugepages>
> </memoryBacking>
>
> This is questionable since there is no guest NUMA node specified,
> but on the other hand we can consider non-NUMA guest to have exactly
> one NUMA node.
>
> This was working but has been broken by commit [1] which tried to
> fix a case where you are trying to specify non-existing NUMA node.
>
> Because of that assumption we can consider this as valid XML even
> though there is no NUMA node specified [2]. There are two possible
> solutions:
>
> - we can leave the XML intact
>
> - we can silently remove the 'nodeset' attribute to 'fix' the
> XML definition
>
> originally: works
> now: fails
> expect: works
>
>
> <memoryBacking>
> <hugepages>
> <page size='1048576' unit='KiB' nodeset='1'/>
> </hugepages>
> </memoryBacking>
>
> If the nodeset is != 0 it should newer work becuase there is no
> guest NUMA topology and even if we take into account the assumption
> that there is always one NUMA node it is still invalid.
>
> originally: works
> now: fails
> expect: fails
>
>
> * One hugepage with specific nodeset and second default hugepage
>
> <memoryBacking>
> <hugepages>
> <page size='1048576' unit='KiB' nodeset='0'/>
> <page size='2048' unit='KiB'/>
> </hugepages>
> </memoryBacking>
>
> This was working but was 'fixed' by commit [1] because the code
> checks only the first hugepage and uses only the first hugepage.
>
> It should never worked because it doesn't make any sense, there
> is no guest NUMA node configured and even if we take into account
> the assumption that non-NUMA guest has one NUMA node there is need
> for the default page size.
>
> originally: works
> now: fails
> expect: fails
>
>
> There is yet another issue with the current state in libvirt, if
> you swap the order of pages:
>
> <memoryBacking>
> <hugepages>
> <page size='2048' unit='KiB'/>
> <page size='1048576' unit='KiB' nodeset='0'/>
> </hugepages>
> </memoryBacking>
>
> it will work even with current libvirt with the fix [1]. The reason
> is that code in qemuBuildMemPathStr() function takes into account
> only the first page size so it depends on the order of elements
> which is wrong.
>
> We should not allow any of these two configurations. Setting
> nodeset to != 0 will should not make any difference.
>
> originally: works
> now: works
> expect: fails
>
>
> ====== NUMA guest ======
>
>
> * Only one hugepage specified without any nodeset
>
> <memoryBacking>
> <hugepages>
> <page size='2048' unit='KiB'/>
> </hugepages>
> </memoryBacking>
> ...
> <cpu mode='host-passthrough' check='none'>
> <topology sockets='1' cores='4' threads='2'/>
> <numa>
> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
> </numa>
> </cpu>
>
> originally: works
> now: works
> expect: works
>
>
> * Only one hugapage specified with nodeset
>
> <memoryBacking>
> <hugepages>
> <page size='2048' unit='KiB' nodeset='0'/>
> </hugepages>
> </memoryBacking>
> ...
> <cpu mode='host-passthrough' check='none'>
> <topology sockets='1' cores='4' threads='2'/>
> <numa>
> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
> </numa>
> </cpu>
>
> All possible combinations for the nodeset attribute are allowed
> if it always corresponds to existing guest NUMA node:
>
> <page size='2048' unit='KiB' nodeset='1'/>
>
> or
>
> <page size='2048' unit='KiB' nodeset='0,1'/>
>
> originally: works
> now: works
> expect: works
>
>
> <memoryBacking>
> <hugepages>
> <page size='2048' unit='KiB' nodeset='2'/>
> </hugepages>
> </memoryBacking>
> ...
> <cpu mode='host-passthrough' check='none'>
> <topology sockets='1' cores='4' threads='2'/>
> <numa>
> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
> </numa>
> </cpu>
>
> There is invalid guest NUMA node specified for the hugepage.
>
> originally: fails
> now: fails
> expect: fails
>
> * One hugepage with specific nodeset and second default hugepage
>
> <memoryBacking>
> <hugepages>
> <page size='1048576' unit='KiB' nodeset='0'/>
> <page size='2048' unit='KiB'/>
> </hugepages>
> </memoryBacking>
> ...
> <cpu mode='host-passthrough' check='none'>
> <topology sockets='1' cores='4' threads='2'/>
> <numa>
> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
> </numa>
> </cpu>
>
> There are two guest NUMA nodes where we have default hugepage size
> configured and for NUMA node '0' we have a different size.
>
> originally: works
> now: works
> expect: works
>
>
> <memoryBacking>
> <hugepages>
> <page size='1048576' unit='KiB' nodeset='0,1'/>
> <page size='2048' unit='KiB'/>
> </hugepages>
> </memoryBacking>
> ...
> <cpu mode='host-passthrough' check='none'>
> <topology sockets='1' cores='4' threads='2'/>
> <numa>
> <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
> <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
> </numa>
> </cpu>
>
> originally: works
> now: works
> expect: fails ???
>
> In this situation all the guest NUMA nodes are covered by the
> hugepage size with specified nodeset attribute. The default one
> is not used at all so is pointless here.
>
> The difference between this and non-NUMA guest is that if we change
> the order it will still work as expected, it doesn't depend on the
> order of elements. However, we might consider is as invalid
> configuration because there is no need to have the default page size
> configured at all.
>
>
> * Multiple combination of default and specific hugepage sizes
>
> There are some restriction if we use multiple page sizes:
>
> - There cannot be two different default hugepage sizes
>
> - Two different page elements cannot have the same guest NUMA
> node specified in nodeset attribute
>
> - hugepages are not allowed if memory backing allocation is set
> to 'ondemand' (not documented)
>
> - hugepages are not allowed if memory backing source is set to
> 'anonymous' (not documented)
>
>
> I hope that there is no other configuration that we need to care about.
>
Okay, let's make our lives simpler. I retract my suggestion to allow
nodeset=0 for non-NUMA guest. Let's do it how you describe here.
Michal
More information about the libvir-list
mailing list