[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [augeas-devel] improving performance of aug_get() and aug_match() with large datasets



On 10/02/2015 02:32 PM, David Lutterkort wrote:
On Thu, Oct 1, 2015 at 11:44 AM, Laine Stump <laine redhat com
<mailto:laine redhat com>> wrote:

    On 09/22/2015 03:18 PM, Laine Stump wrote:

        It was bound to happen eventually. Someone created a host with
        514 vlan interfaces each connected to a host bridge, then
        started up virt-manager. [blah blah boring blah removed]

    To update those not included in a separate thread on the topic in
    netcf-devel (I'll try to keep all discussion here from now on):

    Dan Berrange pointed out that netcf was calling aug_load() on each
    entry to a public netcf API, and libvirt was calling netcf APIs
    multiple times for each interface. Even though aug_load() checks the
    mtime of files it has already loaded, and avoids re-loading those
    that haven't been modified (in this case none have been modified),
    it turns out that just doing a stat() of 1100 files takes a
    significant amount of time. So I modified netcf to only call
    aug_load() to do this check if it has been at least 1 second since
    the last time it was called. This made a very large improvement,
    especially when running the upstream versions of all involved
    packages (virt-manager --> libvirt --> netcf --> augeas). But when
    running the versions that are included in RHEL6, it wasn't so rosy.
    A test setup of 514 bridge+vlan interfaces which took around 30
    minutes (!!) to complete a full startup of virt-manager (which calls
    netcf/augeas to list all interfaces, then get the XML config for
    them) now takes 13 minutes with netcf modified to call aug_load()
    only once per second. (the same operation takes "only" 8 minutes
    using all upstream code).

    But 13 (or even 8) minutes is still a very long time, so I played
    around a bit in gdb and found that most of the time now seems to be
    spent in one call to aug_match():


       r = aug_match(aug, path, "/files/etc/sysconfig/network-scripts/*[
    DEVICE = 'br1' or BRIDGE = 'br1' or MASTER = 'br1' or MASTER =
    ../*[BRIDGE = 'br1']/DEVICE ]/DEVICE");

    (this is the result of a call to netcf's aug_fmt_match() in the
    netcf function aug_get_xml_for_nif())

    When I step over that call to aug_match(), there is a very
    noticeable pause before the gdb prompt comes back, while continuing
    from that point all the way through virt-manager's "get all
    interfaces" loop back to the next call to aug_get_xml_for_nif()
    (including several other calls to aug_match() that have much simpler
    search expressions) seems to happen instantly.

    So apparently doing a match against all ifcfg files based on this
    complex match expression is really slowing us down. Any ideas on how
    to either make this expression simpler, or alternately how to get
    augeas doing the search more quickly?


Was that with the performance stuff I did a few days ago ? (You'd need
Augeas HEAD for that)

No, I am running the augeas that comes with Fedora 22 (1.4.0-1) (or alternately, the one that comes with RHEL6.7 - an ancient 1.0.0). Let me see if I can successfully make augeas rpms from upstream (in the middle of "make distcheck right now) and see if there's a difference with the latest code.


Alternatively, can you send me your /etc/sysconfig/network-scripts ?

I actually have a set of scripts that create and destroy as many vlan+bridge pairs as I want, based on a 3 line .config file in the same directory. I'll tar that up and send it separately (I figure nobody would appreciate a binary attachment sent to the list :-)

(Fair warning: I will have no time to look into this next week)


        I have two questions based on this:

        1) has anyone thought about/looked into optimizing/changing the
        data structure used to store nodes in augeas to scale better
        with larger datasets (execution time seems to increase at > linear)?


 From what Dominic turned up, the problem doesn't seem to be so much the
data structure for the tree, as the fact that there was some O(n^2)
behavior in building intermediate data structures.

        2) I recall that a long time ago augeas put in code to
        re-read/parse files only if they had been modified. netcf (and
        thus libvirt) could take advantage of this info if it was
        available in the augeas API - the first time it retrieved the
        info for an interface it would take a hit, but all subsequent
        times could be much quicker.


    About this one - I'm wondering how well it would work out for augeas
    to use inotify to learn about modifications to files (including the
    directory that the ifcfg files live in, in case a new file is
    created). It works okay for netcf to avoid calling aug_load() (as
    mentioned above), but it does make me a bit uncomfortable that we
    sometimes have a mistaken view of the config.


It would definitely be a possibilty - we would still need to queue
notifications from inotify and only act on them when the user calls
aug_load to avoid things like destroying changes the user made; IOW, it
still needs to stay predictable when the tree changes based on changes
in the FS. It's been a while since I've looked at inotify, but I think
it would also introduce a Linux dependency; we could work around that by
only using it where available, and falling back to today's behavior.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]