[libvirt] Exposing mem-path in domain XML

Wed Sep 6 11:42:49 UTC 2017

On Wed, Sep 06, 2017 at 01:35:45PM +0200, Michal Privoznik wrote:
> On 09/05/2017 04:07 PM, Daniel P. Berrange wrote:
> > On Tue, Sep 05, 2017 at 03:59:09PM +0200, Michal Privoznik wrote:
> >> On 07/28/2017 10:59 AM, Daniel P. Berrange wrote:
> >>> On Fri, Jul 28, 2017 at 10:45:21AM +0200, Michal Privoznik wrote:
> >>>> On 07/27/2017 03:50 PM, Daniel P. Berrange wrote:
> >>>>> On Thu, Jul 27, 2017 at 02:11:25PM +0200, Michal Privoznik wrote:
> >>>>>> Dear list,
> >>>>>>
> >>>>>> there is the following bug [1] which I'm not quite sure how to grasp. So
> >>>>>> there is this application/infrastructure called Kove [2] that allows you
> >>>>>> to have memory for your application stored on a distant host in network
> >>>>>> and basically fetch needed region on pagefault. Now imagine that
> >>>>>> somebody wants to use it for backing up domain memory. However, the way
> >>>>>> that the tool works is it has some kernel module and then some userland
> >>>>>> binary that is fed with the path of the mmaped file. I don't know all
> >>>>>> the details, but the point is, in order to let users use this we need to
> >>>>>> expose the paths for mem-path for the guest memory. I know we did not
> >>>>>> want to do this in the past, but now it looks like we don't have a way
> >>>>>> around it, do we?
> >>>>>
> >>>>> We don't want to expose the concept of paths in the XML because this is
> >>>>> a linux specific way to configure hugepages / shared memory. So we hide
> >>>>> the particular path used in the internal impl of the QEMU driver, and
> >>>>> or via the qemu.conf global config file. I don't really want to change
> >>>>> that approach, particularly if the only reason is to integrate with a
> >>>>> closed source binary like Kove. 
> >>>>
> >>>> Yep, I agree with that. However, if you read the discussion in the
> >>>> linked bug you'll find that they need to know what file in the
> >>>> memory_backing_dir (from qemu.conf) corresponds to which domain. The
> >>>> reported suggested using UUID based filenames, which I fear is not
> >>>> enough because one can have multiple <memory type='dimm'/> -s configured
> >>>> for their domain. But I guess we could go with:
> >>>>
> >>>> ${memory_backing_dir}/${domName}        for generic memory
> >>>> ${memory_backing_dir}/${domName}_N      for Nth <memory/>
> >>>
> >>> This feels like it is going to lead to hell when you add in memory
> >>> hotplug/unplug, with inevitable races.
> >>>
> >>>> BTW: IIUC they want predictable names because they need to create the
> >>>> files before spawning qemu so that they are picked by qemu instead of
> >>>> using temporary names.
> >>>
> >>> I would like to know why they even need to associate particular memory
> >>> files with particular QEMU processes. eg if they're just exposing a
> >>> new type of tmpfs filesystem from the kernel why does it matter what
> >>> each file is used for.
> >>
> >> This might get you answer:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1461214#c4
> >>
> >> So the way I understand it is that they will create the files, and
> >> provide us with paths. So luckily, we don't have to make up the paths on
> >> our own.
> > 
> > IOW it is pretending to be tmpfs except it is not behaving like tmpfs.
> > This doesn't really make me any more inclined to support this closed
> > source stuff in libvirt.
> 
> Yeah, that's my feeling too. So, what about the following: let's assume
> they will fix their code so that it is proper tmpfs. Libvirt can then
> behave to it just like it is already doing so for hugetlbfs. For us
> it'll be just yet another type of hugepages. I mean, for hugepages we
> already create /hupages/mount/point/libvirt/$domain per each domain so
> the separation is there (even though this is considered internal impl),
> since it would be a proper tmpfs they can see the pid of qemu which is
> trying to mmap() (and take the name or whatever unique ID they want from
> there).

Yep, we can at least make a reasonable guarantee that all files belonging
to a single QEMU process will always be within the same sub-directory.
This allows the kmod to distinguish 2 files owned by separate VMs, from 2
files owned by the same VM and do what's needed. I don't see why it would
need to care about naming conventions beyond the layout.

> I guess what I'm trying to ask is if it was proper tmpfs, we would be
> okay with it, wouldn't we?

If it is indistinguishable from tmpfs/hugetlbfs from libvirt's POV, we
should be fine -  at most you would need /etc/libvirt/qemu.conf change
to explicitly point at the custom mount point if libvirt doesn't
auto-detect the right one.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|