[libvirt] [PATCH 2/4] conf: Move hugepage XML validation check out of qemu_command

Michal Privoznik mprivozn at redhat.com
Fri Jul 13 14:57:22 UTC 2018


On 07/13/2018 02:02 PM, Pavel Hrdina wrote:
> On Wed, Jul 11, 2018 at 06:03:08PM +0200, Pavel Hrdina wrote:
>> To make it clear I'll summarize all the possible combinations and how it
>> should work so we are on the same page.
> 
> originally: before commit [1]
> now: after commit [1] (current master)
> expect: what this patch series should fix
> 
> 
> ======= Non-NUMA guests =======
> 
> 
> * Only one hugepage specified without any nodeset
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='1048576' unit='KiB'/>
>       </hugepages>
>     </memoryBacking>
> 
>     This is correct, was always working and we should not change it.
> 
>     originally: works
>     now: works
>     expected: works
> 
> 
> * Only one hugapage specified with nodeset
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='1048576' unit='KiB' nodeset='0'/>
>       </hugepages>
>     </memoryBacking>
> 
>     This is questionable since there is no guest NUMA node specified,
>     but on the other hand we can consider non-NUMA guest to have exactly
>     one NUMA node.
> 
>     This was working but has been broken by commit [1]  which tried to
>     fix a case where you are trying to specify non-existing NUMA node.
> 
>     Because of that assumption we can consider this as valid XML even
>     though there is no NUMA node specified [2].  There are two possible
>     solutions:
> 
>         - we can leave the XML intact
> 
>         - we can silently remove the 'nodeset' attribute to 'fix' the
>           XML definition
> 
>     originally: works
>     now: fails
>     expect: works
> 
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='1048576' unit='KiB' nodeset='1'/>
>       </hugepages>
>     </memoryBacking>
> 
>     If the nodeset is != 0 it should newer work becuase there is no
>     guest NUMA topology and even if we take into account the assumption
>     that there is always one NUMA node it is still invalid.
> 
>     originally: works
>     now: fails
>     expect: fails
> 
> 
> * One hugepage with specific nodeset and second default hugepage
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='1048576' unit='KiB' nodeset='0'/>
>         <page size='2048' unit='KiB'/>
>       </hugepages>
>     </memoryBacking>
> 
>     This was working but was 'fixed' by commit [1] because the code
>     checks only the first hugepage and uses only the first hugepage.
> 
>     It should never worked because it doesn't make any sense, there
>     is no guest NUMA node configured and even if we take into account
>     the assumption that non-NUMA guest has one NUMA node there is need
>     for the default page size.
> 
>     originally: works
>     now: fails
>     expect: fails
> 
> 
>     There is yet another issue with the current state in libvirt, if
>     you swap the order of pages:
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='2048' unit='KiB'/>
>         <page size='1048576' unit='KiB' nodeset='0'/>
>       </hugepages>
>     </memoryBacking>
> 
>     it will work even with current libvirt with the fix [1].  The reason
>     is that code in qemuBuildMemPathStr() function takes into account
>     only the first page size so it depends on the order of elements
>     which is wrong.
> 
>     We should not allow any of these two configurations.  Setting
>     nodeset to != 0 will should not make any difference.
> 
>     originally: works
>     now: works
>     expect: fails
> 
> 
> ====== NUMA guest ======
> 
> 
> * Only one hugepage specified without any nodeset
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='2048' unit='KiB'/>
>       </hugepages>
>     </memoryBacking>
>     ...
>     <cpu mode='host-passthrough' check='none'>
>       <topology sockets='1' cores='4' threads='2'/>
>       <numa>
>         <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
>         <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
>       </numa>
>     </cpu>
> 
>     originally: works
>     now: works
>     expect: works
> 
> 
> * Only one hugapage specified with nodeset
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='2048' unit='KiB' nodeset='0'/>
>       </hugepages>
>     </memoryBacking>
>     ...
>     <cpu mode='host-passthrough' check='none'>
>       <topology sockets='1' cores='4' threads='2'/>
>       <numa>
>         <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
>         <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
>       </numa>
>     </cpu>
> 
>     All possible combinations for the nodeset attribute are allowed
>     if it always corresponds to existing guest NUMA node:
> 
>         <page size='2048' unit='KiB' nodeset='1'/>
> 
>         or
> 
>         <page size='2048' unit='KiB' nodeset='0,1'/>
> 
>     originally: works
>     now: works
>     expect: works
> 
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='2048' unit='KiB' nodeset='2'/>
>       </hugepages>
>     </memoryBacking>
>     ...
>     <cpu mode='host-passthrough' check='none'>
>       <topology sockets='1' cores='4' threads='2'/>
>       <numa>
>         <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
>         <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
>       </numa>
>     </cpu>
> 
>     There is invalid guest NUMA node specified for the hugepage.
> 
>     originally: fails
>     now: fails
>     expect: fails
> 
> * One hugepage with specific nodeset and second default hugepage
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='1048576' unit='KiB' nodeset='0'/>
>         <page size='2048' unit='KiB'/>
>       </hugepages>
>     </memoryBacking>
>     ...
>     <cpu mode='host-passthrough' check='none'>
>       <topology sockets='1' cores='4' threads='2'/>
>       <numa>
>         <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
>         <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
>       </numa>
>     </cpu>
> 
>     There are two guest NUMA nodes where we have default hugepage size
>     configured and for NUMA node '0' we have a different size.
> 
>     originally: works
>     now: works
>     expect: works
> 
> 
>     <memoryBacking>
>       <hugepages>
>         <page size='1048576' unit='KiB' nodeset='0,1'/>
>         <page size='2048' unit='KiB'/>
>       </hugepages>
>     </memoryBacking>
>     ...
>     <cpu mode='host-passthrough' check='none'>
>       <topology sockets='1' cores='4' threads='2'/>
>       <numa>
>         <cell id='0' cpus='0-3' memory='4194304' unit='KiB'/>
>         <cell id='1' cpus='4-7' memory='4194304' unit='KiB'/>
>       </numa>
>     </cpu>
> 
>     originally: works
>     now: works
>     expect: fails ???
> 
>     In this situation all the guest NUMA nodes are covered by the
>     hugepage size with specified nodeset attribute.  The default one
>     is not used at all so is pointless here.
>     
>     The difference between this and non-NUMA guest is that if we change
>     the order it will still work as expected, it doesn't depend on the
>     order of elements.  However, we might consider is as invalid
>     configuration because there is no need to have the default page size
>     configured at all.
> 
> 
> * Multiple combination of default and specific hugepage sizes
> 
>     There are some restriction if we use multiple page sizes:
>         
>         - There cannot be two different default hugepage sizes
> 
>         - Two different page elements cannot have the same guest NUMA
>           node specified in nodeset attribute
> 
>         - hugepages are not allowed if memory backing allocation is set
>           to 'ondemand' (not documented)
> 
>         - hugepages are not allowed if memory backing source is set to
>           'anonymous' (not documented)
> 
> 
> I hope that there is no other configuration that we need to care about.
> 

Okay, let's make our lives simpler. I retract my suggestion to allow
nodeset=0 for non-NUMA guest. Let's do it how you describe here.

Michal




More information about the libvir-list mailing list