[libvirt] [PATCH 2/2] qemu: Check for guest NUMA nodes properly when validating hugepages

Michal Prívozník mprivozn at redhat.com
Wed Jul 4 13:01:22 UTC 2018


On 07/04/2018 02:56 PM, Pavel Hrdina wrote:
> On Wed, Jul 04, 2018 at 02:35:23PM +0200, Michal Prívozník wrote:
>> On 07/04/2018 02:05 PM, Pavel Hrdina wrote:
>>> On Wed, Jul 04, 2018 at 01:51:57PM +0200, Michal Privoznik wrote:
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1448149
>>>
>>> This BZ is not related to the patch, it describes different issue.
>>>
>>> There is different BZ that tracks the issue:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1591235
>>>
>>>> In fa6bdf6afa878 and ec982f6d929f3c23 I've tried to forbid
>>>> configuring <hugepages/> for non-existent guest NUMA nodes.
>>>> However, the check was not fine tuned enough as it failed when no
>>>> guest NUMA was configured but <hugepages/> were configured for
>>>> guest NUMA node #0.
>>>>
>>>> Signed-off-by: Michal Privoznik <mprivozn at redhat.com>
>>>> ---
>>>>  src/qemu/qemu_command.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
>>>> index 04c5c28438..15caf392aa 100644
>>>> --- a/src/qemu/qemu_command.c
>>>> +++ b/src/qemu/qemu_command.c
>>>> @@ -7421,7 +7421,7 @@ qemuBuildMemPathStr(virQEMUDriverConfigPtr cfg,
>>>>  
>>>>      if (def->mem.hugepages[0].nodemask) {
>>>>          ssize_t next_bit = virBitmapNextSetBit(def->mem.hugepages[0].nodemask, -1);
>>>> -        if (next_bit >= 0) {
>>>> +        if (next_bit > 0) {
>>>>              virReportError(VIR_ERR_XML_DETAIL,
>>>>                             _("hugepages: node %zd not found"),
>>>>                             next_bit);
>>>
>>> This code is weird, it checks only the first hugepage element but 
>>> the XML in this case can look like this:
>>>
>>>   <memoryBacking>
>>>     <hugepages>
>>>       <page size='2048' unit='KiB' nodeset='0'/>
>>>       <page size='1048576' unit='KiB'/>
>>>     </hugepages>
>>>     <locked/>
>>>   </memoryBacking>
>>>
>>> which is obviously wrong and would be accepted and 2048KiB pages would
>>> be used.
>>
>> Can you elaborate more on why do you consider this erroneous behvaiour?
>> The whole idea behind @nodeset was to have one default (1GiB in this
>> case) and then allow users fine tune hugepages used for NUMA nodes. So
>> the config you pasted says: 1GiB is the default and for NUMA node #0 use
>> 2MiB.
> 
> Well, this patch fixes an issue where you don't have NUMA nodes for the
> guest, but the XML example would be accepted as valid for guest without
> NUMA nodes and that is wrong.
> 
> The 2MiB would be the default and the 1GiB would be ignored.
> 
>>>
>>>> @@ -7577,7 +7577,7 @@ qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg,
>>>>          }
>>>>  
>>>>          next_bit = virBitmapNextSetBit(def->mem.hugepages[i].nodemask, pos);
>>>> -        if (next_bit >= 0) {
>>>> +        if (next_bit > 0) {
>>>>              virReportError(VIR_ERR_XML_DETAIL,
>>>>                             _("hugepages: node %zd not found"),
>>>>                             next_bit);
>>>
>>> The fact that we have this check in two places makes me thing that it's
>>> wrong.  
>>
>> The reason is that we have two options for generating command line to
>> enable hugepages (which are not compatible so we have to have both to
>> maintain backward compatibility upon migration):
>>
>> 1) -mem-path /path/to/hugetlbfs
>> 2) -object memory-backend-file
> 
> The check is executed for the same data with the same result in two
> different paths, so we can move it up before the path is split.
> 
>>> Actually this should be moved from qemu build command into qemu
>>> validate code or event into generic validate code.
>>
>> I fear we can't do that. Historically we've accepted a wide variety of
>> misconfigured domains from this respect and putting the check into
>> validation code would mean losing them on daemon restart. That's the
>> reason we have those checks at domain startup time.
> 
> The validation code is not executed on daemon restart exactly for that
> reason so there is no issue with that.
> 

Well, if you think you can do this then I am all for it and I'm holding 
my fingers crossed for you. I've spent non-trivial amount of time trying 
to fix all the corner cases (roughly:

  libvirt.git $ git log --author=mprivozn --oneline | grep -i huge | wc -l

) and it would be pity if we would break it now.

Michal




More information about the libvir-list mailing list