[libvirt] [PATCH 5/5] tests: add qemuxml2argv memfd-memory-numa test

Mon Sep 17 11:07:21 UTC 2018

On 09/17/2018 11:30 AM, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Sep 14, 2018 at 11:44 AM, Michal Prívozník <mprivozn at redhat.com> wrote:
>> On 09/13/2018 11:51 PM, John Ferlan wrote:
>>>
>>>
>>> On 09/13/2018 10:09 AM, John Ferlan wrote:
>>>>
>>>>
>>>> On 09/13/2018 03:39 AM, Marc-André Lureau wrote:
>>>>> Hi
>>>>>
>>>>> On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan at redhat.com> wrote:
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>>
>>>>>>>> So all that's "left":
>>>>>>>>
>>>>>>>>  1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not
>>>>>>>> changing from memory-backend-ram to memory-backend-memfd.  We already
>>>>>>>> check that "(src->mem.source != dst->mem.source)" - so we know we're
>>>>>>>> already anonymous or not.
>>>>>>>>
>>>>>>>> Any suggestions?  If source is anonymous, then what?  I think we can use
>>>>>>>> the qemuDomainObjPrivatePtr in some way to determine that we were
>>>>>>>> started with -memfd (or not started that way).
>>>>>>>
>>>>>>> No idea how we could save that information across various restarts /
>>>>>>> version changes.
>>>>>>
>>>>>> I think it'd be ugly... I think migration cookies would have to be
>>>>>> used... I considered other mechanisms, but each wouldn't quite work.
>>>>>> Without writing the code, if we cared to do this, then we'd have:
>>>>>>
>>>>>> 1. Add a field to qemuDomainObjPrivatePtr that indicates what got
>>>>>> started (none, memfd, file, or ram). Add a typedef enum that has
>>>>>> unknown, none, memfd, file, and ram. Add the Parse/Format code to handle
>>>>>> the field.
>>>>>>
>>>>>> 2. Modify the qemu_command code to set the field in priv based on what
>>>>>> got started, if something got started. The value would be > 0...
>>>>>>
>>>>>> 3. Mess with the migration cookie logic to add checks for what the
>>>>>> source started. On the destination side of that cookie if we had the
>>>>>> "right capabilities", then check the source cookie to see what it has.
>>>>>> If it didn't have that field, then I think one could assume the source
>>>>>> with anonymous memory backing would be using -ram. We'd already fail the
>>>>>> src/dst mem.source check if one used -file. I'm not all the versed in
>>>>>> the cookies, but I think that'd work "logically thinking" at least. The
>>>>>> devil would be in the details.
>>>>>>
>>>>>> Assuming your 3.1 patches do something to handle the condition, I guess
>>>>>> it comes does to how much of a problem it's believed this could be in
>>>>>> 2.12 and 3.0 if someone is running -ram and migrates to a host that
>>>>>> would default to -memfd.
>>>>>
>>>>> I am afraid we will need to do it to handle transparent -memfd usage.
>>>>> I'll look at it with your help.
>>>>>
>>>>
>>>> Let's see what I can cobble together. I'll repost the series a bit later
>>>> today hopefully.
>>>>
>>>
>>> After spending a few hours on this, the cookies just don't help enough
>>> or I don't know/understand enough about their usage.
>>>
>>> I keep coming back to the problem of how do we disallow a migration from
>>> a host that has/knows about and uses anonymous memfd to one that doesn't
>>> know about it. Similarly, if a domain source w/ "file" or "ram" (whether
>>> at startup time or via hotplug) is migrated to a target host that would
>>> generate memfd - we have no mechanism to stop the migration because we
>>> have no way to tell what it was running, especially since what gets
>>> started isn't just based off the source type - hugepages have a
>>> tangential role. Lots of logic stuffed into qemu_command that probably
>>> should have been in some qemuDomainPrepareMemtune API.
>>>
>>> So unfortunately, I think the only safe way is to create a new source
>>> type ("anonmem", "anonfile", "anonmemfd", ??) and describe it as lightly
>>> as the other entries are described (ironically the document default of
>>> "anonymous" could be "file" or it could be "ram" based 3 other factors
>>> not described in the docs). At least with a new type name/value we can
>>> guarantee that someone selects it by name rather than the multipurpose
>>> "anonymous" type. I think it would mean moving the caps checks to a bit
>>> later in the code, search for "otherwise check the required capability".
>>>
>>> Unless someone still brave enough to keep reading this stream has an
>>> idea to try. I'm tapped out!
>>
>> We can have an element/attribute in status XML/migration XML saying
>> which backend we've used. This is slightly tricky because we have more
>> places then one where users can tune confuguration such that we use
>> different backends. My personal favorite is:
>>
>>   <memoryBacking>
>>     <hugepages>
>>       <page size='2048' unit='KiB' nodeset='1'/>
>>     </hugepages>
>>   </memoryBacking>
>>
>>   <cpu>
>>     <numa>
>>       <cell id='0' cpus='0' memory='1048576' unit='KiB'/>
>>       <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared'/>
>>       <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private'/>
>>       <cell id='3' cpus='3' memory='1048576' unit='KiB'/>
>>     </numa>
>>   </cpu>
>>
>>   <devices>
>>     <memory model='dimm'>
>>       <target>
>>         <size unit='KiB'>524288</size>
>>         <node>1</node>
>>       </target>
>>       <address type='dimm' slot='0' base='0x100000000'/>
>>     </memory>
>>   </devices>
>>
>>
>> So what we can have is:
>>
>>   <hugepages>
>>     <page size=.... backend='memory-backend-file'/>
>>   </hugepages>
>>
>>       <cell id='0' cpus='0' memory='1048576' unit='KiB' backend='memory-backend-ram'/>
>>       <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared' backend='memory-backend-file'/>
>>       <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private' backend='memory-backend-file/>
>>       <cell id='3' cpus='3' memory='1048576' unit='KiB' backend='memory-backend-ram'/>
>>
>>   <devices>
>>     <memory model='dimm' backend='memory-backend-ram'/>
> 
> That's a bit overkill to me, since we don't have (yet) the capacity
> for a user to select the memory backend, and the value is a
> qemu-specific detail.

So status XML is not something we parse from user. It's produced by
libvirt and it's a superset of user provided XML and some runtime
information. For instance, look around the lines where
VIR_DOMAIN_DEF_PARSE_STATUS flag occurs.

> 
>>     ..
>>   </devices>
>>
>>
>> This way we know what backend was used on the source (in saved state)
>> and the only thing we need to know on dst (on restore) is to check if
>> given backend is available.
>>
>> I don't think putting anything in migration cookies is going to help.
>> It might help migration if anything but it will definitely keep
>> save/restore broken as there are no migration cookies.
> 
> Ah, too bad. I am not familar enough with migration and save/restore
> in libvirt. But I started to imagine how the migration cookie could
> have been used.

>From qemu POV, there's no difference between migration and save/restore.
All of them is a migration except save/restore is migration to/from a
file (FD actually).

> 
> Is there only in the domain XML we can save information?

Yes, status XML. That's where libvirt keeps its runtime information (and
which backend was used falls exactly into this category) so that it is
preserved on the daemon restart.

> 
> If yes, then either we go with your proposal (although I wonder if it
> should be qemu: namespaced) or can we introduce libvirt capabilites?
> (something as simple as
> <capabilities><qemu-memorybackend-memfd</capabilities>) ?

No need. Once again, this is not something that users will ever see, nor
libvirt would parse it when parsing input from user.

The same applies for migration XML. These two are different in some
aspects, but that is not critical for this feature. It's sufficient to
say for now that status XML preserves runtime data between daemon
restarts (we want freshly restarted libvirt to remember what backend was
used) and migration XML preserves runtime data on migration (we want the
destination to know what backend is used).

Michal