[libvirt] [PATCH 5/5] tests: add qemuxml2argv memfd-memory-numa test

Marc-André Lureau marcandre.lureau at redhat.com
Mon Sep 17 11:22:02 UTC 2018


Hi

On Mon, Sep 17, 2018 at 3:07 PM, Michal Privoznik <mprivozn at redhat.com> wrote:
> On 09/17/2018 11:30 AM, Marc-André Lureau wrote:
>> Hi
>>
>> On Fri, Sep 14, 2018 at 11:44 AM, Michal Prívozník <mprivozn at redhat.com> wrote:
>>> On 09/13/2018 11:51 PM, John Ferlan wrote:
>>>>
>>>>
>>>> On 09/13/2018 10:09 AM, John Ferlan wrote:
>>>>>
>>>>>
>>>>> On 09/13/2018 03:39 AM, Marc-André Lureau wrote:
>>>>>> Hi
>>>>>>
>>>>>> On Thu, Sep 13, 2018 at 2:25 AM, John Ferlan <jferlan at redhat.com> wrote:
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>>
>>>>>>>>> So all that's "left":
>>>>>>>>>
>>>>>>>>>  1. "Add" a check in qemuDomainABIStabilityCheck to ensure we're not
>>>>>>>>> changing from memory-backend-ram to memory-backend-memfd.  We already
>>>>>>>>> check that "(src->mem.source != dst->mem.source)" - so we know we're
>>>>>>>>> already anonymous or not.
>>>>>>>>>
>>>>>>>>> Any suggestions?  If source is anonymous, then what?  I think we can use
>>>>>>>>> the qemuDomainObjPrivatePtr in some way to determine that we were
>>>>>>>>> started with -memfd (or not started that way).
>>>>>>>>
>>>>>>>> No idea how we could save that information across various restarts /
>>>>>>>> version changes.
>>>>>>>
>>>>>>> I think it'd be ugly... I think migration cookies would have to be
>>>>>>> used... I considered other mechanisms, but each wouldn't quite work.
>>>>>>> Without writing the code, if we cared to do this, then we'd have:
>>>>>>>
>>>>>>> 1. Add a field to qemuDomainObjPrivatePtr that indicates what got
>>>>>>> started (none, memfd, file, or ram). Add a typedef enum that has
>>>>>>> unknown, none, memfd, file, and ram. Add the Parse/Format code to handle
>>>>>>> the field.
>>>>>>>
>>>>>>> 2. Modify the qemu_command code to set the field in priv based on what
>>>>>>> got started, if something got started. The value would be > 0...
>>>>>>>
>>>>>>> 3. Mess with the migration cookie logic to add checks for what the
>>>>>>> source started. On the destination side of that cookie if we had the
>>>>>>> "right capabilities", then check the source cookie to see what it has.
>>>>>>> If it didn't have that field, then I think one could assume the source
>>>>>>> with anonymous memory backing would be using -ram. We'd already fail the
>>>>>>> src/dst mem.source check if one used -file. I'm not all the versed in
>>>>>>> the cookies, but I think that'd work "logically thinking" at least. The
>>>>>>> devil would be in the details.
>>>>>>>
>>>>>>> Assuming your 3.1 patches do something to handle the condition, I guess
>>>>>>> it comes does to how much of a problem it's believed this could be in
>>>>>>> 2.12 and 3.0 if someone is running -ram and migrates to a host that
>>>>>>> would default to -memfd.
>>>>>>
>>>>>> I am afraid we will need to do it to handle transparent -memfd usage.
>>>>>> I'll look at it with your help.
>>>>>>
>>>>>
>>>>> Let's see what I can cobble together. I'll repost the series a bit later
>>>>> today hopefully.
>>>>>
>>>>
>>>> After spending a few hours on this, the cookies just don't help enough
>>>> or I don't know/understand enough about their usage.
>>>>
>>>> I keep coming back to the problem of how do we disallow a migration from
>>>> a host that has/knows about and uses anonymous memfd to one that doesn't
>>>> know about it. Similarly, if a domain source w/ "file" or "ram" (whether
>>>> at startup time or via hotplug) is migrated to a target host that would
>>>> generate memfd - we have no mechanism to stop the migration because we
>>>> have no way to tell what it was running, especially since what gets
>>>> started isn't just based off the source type - hugepages have a
>>>> tangential role. Lots of logic stuffed into qemu_command that probably
>>>> should have been in some qemuDomainPrepareMemtune API.
>>>>
>>>> So unfortunately, I think the only safe way is to create a new source
>>>> type ("anonmem", "anonfile", "anonmemfd", ??) and describe it as lightly
>>>> as the other entries are described (ironically the document default of
>>>> "anonymous" could be "file" or it could be "ram" based 3 other factors
>>>> not described in the docs). At least with a new type name/value we can
>>>> guarantee that someone selects it by name rather than the multipurpose
>>>> "anonymous" type. I think it would mean moving the caps checks to a bit
>>>> later in the code, search for "otherwise check the required capability".
>>>>
>>>> Unless someone still brave enough to keep reading this stream has an
>>>> idea to try. I'm tapped out!
>>>
>>> We can have an element/attribute in status XML/migration XML saying
>>> which backend we've used. This is slightly tricky because we have more
>>> places then one where users can tune confuguration such that we use
>>> different backends. My personal favorite is:
>>>
>>>   <memoryBacking>
>>>     <hugepages>
>>>       <page size='2048' unit='KiB' nodeset='1'/>
>>>     </hugepages>
>>>   </memoryBacking>
>>>
>>>   <cpu>
>>>     <numa>
>>>       <cell id='0' cpus='0' memory='1048576' unit='KiB'/>
>>>       <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared'/>
>>>       <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private'/>
>>>       <cell id='3' cpus='3' memory='1048576' unit='KiB'/>
>>>     </numa>
>>>   </cpu>
>>>
>>>   <devices>
>>>     <memory model='dimm'>
>>>       <target>
>>>         <size unit='KiB'>524288</size>
>>>         <node>1</node>
>>>       </target>
>>>       <address type='dimm' slot='0' base='0x100000000'/>
>>>     </memory>
>>>   </devices>
>>>
>>>
>>> So what we can have is:
>>>
>>>   <hugepages>
>>>     <page size=.... backend='memory-backend-file'/>
>>>   </hugepages>
>>>
>>>       <cell id='0' cpus='0' memory='1048576' unit='KiB' backend='memory-backend-ram'/>
>>>       <cell id='1' cpus='1' memory='1048576' unit='KiB' memAccess='shared' backend='memory-backend-file'/>
>>>       <cell id='2' cpus='2' memory='1048576' unit='KiB' memAccess='private' backend='memory-backend-file/>
>>>       <cell id='3' cpus='3' memory='1048576' unit='KiB' backend='memory-backend-ram'/>
>>>
>>>   <devices>
>>>     <memory model='dimm' backend='memory-backend-ram'/>
>>
>> That's a bit overkill to me, since we don't have (yet) the capacity
>> for a user to select the memory backend, and the value is a
>> qemu-specific detail.
>
> So status XML is not something we parse from user. It's produced by
> libvirt and it's a superset of user provided XML and some runtime
> information. For instance, look around the lines where
> VIR_DOMAIN_DEF_PARSE_STATUS flag occurs.
>
>>
>>>     ..
>>>   </devices>
>>>
>>>
>>> This way we know what backend was used on the source (in saved state)
>>> and the only thing we need to know on dst (on restore) is to check if
>>> given backend is available.
>>>
>>> I don't think putting anything in migration cookies is going to help.
>>> It might help migration if anything but it will definitely keep
>>> save/restore broken as there are no migration cookies.
>>
>> Ah, too bad. I am not familar enough with migration and save/restore
>> in libvirt. But I started to imagine how the migration cookie could
>> have been used.
>
> From qemu POV, there's no difference between migration and save/restore.
> All of them is a migration except save/restore is migration to/from a
> file (FD actually).
>
>>
>> Is there only in the domain XML we can save information?
>
> Yes, status XML. That's where libvirt keeps its runtime information (and
> which backend was used falls exactly into this category) so that it is
> preserved on the daemon restart.
>
>>
>> If yes, then either we go with your proposal (although I wonder if it
>> should be qemu: namespaced) or can we introduce libvirt capabilites?
>> (something as simple as
>> <capabilities><qemu-memorybackend-memfd</capabilities>) ?
>
> No need. Once again, this is not something that users will ever see, nor
> libvirt would parse it when parsing input from user.
>
> The same applies for migration XML. These two are different in some
> aspects, but that is not critical for this feature. It's sufficient to
> say for now that status XML preserves runtime data between daemon
> restarts (we want freshly restarted libvirt to remember what backend was
> used) and migration XML preserves runtime data on migration (we want the
> destination to know what backend is used).

Ok

Wouldn't it be easier to have <source type="memfd"/>

Daniel didn't have a strong objection against it, it was more of a
suggestion for "anonymous" type improvement:
https://www.redhat.com/archives/libvir-list/2018-August/msg01841.html

Eventually, "anonymous" could be smartly changed to "memfd" by libvirt
when possible (from a non-resume start)

thanks




More information about the libvir-list mailing list