[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [augeas-devel] improving performance of aug_get() and aug_match() with large datasets

On Thu, Oct 1, 2015 at 11:44 AM, Laine Stump <laine redhat com> wrote:
On 09/22/2015 03:18 PM, Laine Stump wrote:
It was bound to happen eventually. Someone created a host with 514 vlan interfaces each connected to a host bridge, then started up virt-manager. [blah blah boring blah removed]
To update those not included in a separate thread on the topic in netcf-devel (I'll try to keep all discussion here from now on):

Dan Berrange pointed out that netcf was calling aug_load() on each entry to a public netcf API, and libvirt was calling netcf APIs multiple times for each interface. Even though aug_load() checks the mtime of files it has already loaded, and avoids re-loading those that haven't been modified (in this case none have been modified), it turns out that just doing a stat() of 1100 files takes a significant amount of time. So I modified netcf to only call aug_load() to do this check if it has been at least 1 second since the last time it was called. This made a very large improvement, especially when running the upstream versions of all involved packages (virt-manager --> libvirt --> netcf --> augeas). But when running the versions that are included in RHEL6, it wasn't so rosy. A test setup of 514 bridge+vlan interfaces which took around 30 minutes (!!) to complete a full startup of virt-manager (which calls netcf/augeas to list all interfaces, then get the XML config for them) now takes 13 minutes with netcf modified to call aug_load() only once per second. (the same operation takes "only" 8 minutes using all upstream code).

But 13 (or even 8) minutes is still a very long time, so I played around a bit in gdb and found that most of the time now seems to be spent in one call to aug_match():

  r = aug_match(aug, path, "/files/etc/sysconfig/network-scripts/*[ DEVICE = 'br1' or BRIDGE = 'br1' or MASTER = 'br1' or MASTER = ../*[BRIDGE = 'br1']/DEVICE ]/DEVICE");

(this is the result of a call to netcf's aug_fmt_match() in the netcf function aug_get_xml_for_nif())

When I step over that call to aug_match(), there is a very noticeable pause before the gdb prompt comes back, while continuing from that point all the way through virt-manager's "get all interfaces" loop back to the next call to aug_get_xml_for_nif() (including several other calls to aug_match() that have much simpler search expressions) seems to happen instantly.

So apparently doing a match against all ifcfg files based on this complex match _expression_ is really slowing us down. Any ideas on how to either make this _expression_ simpler, or alternately how to get augeas doing the search more quickly?

Was that with the performance stuff I did a few days ago ? (You'd need Augeas HEAD for that)

Alternatively, can you send me your /etc/sysconfig/network-scripts ? (Fair warning: I will have no time to look into this next week)

I have two questions based on this:

1) has anyone thought about/looked into optimizing/changing the data structure used to store nodes in augeas to scale better with larger datasets (execution time seems to increase at > linear)?

From what Dominic turned up, the problem doesn't seem to be so much the data structure for the tree, as the fact that there was some O(n^2) behavior in building intermediate data structures.
2) I recall that a long time ago augeas put in code to re-read/parse files only if they had been modified. netcf (and thus libvirt) could take advantage of this info if it was available in the augeas API - the first time it retrieved the info for an interface it would take a hit, but all subsequent times could be much quicker.

About this one - I'm wondering how well it would work out for augeas to use inotify to learn about modifications to files (including the directory that the ifcfg files live in, in case a new file is created). It works okay for netcf to avoid calling aug_load() (as mentioned above), but it does make me a bit uncomfortable that we sometimes have a mistaken view of the config.

It would definitely be a possibilty - we would still need to queue notifications from inotify and only act on them when the user calls aug_load to avoid things like destroying changes the user made; IOW, it still needs to stay predictable when the tree changes based on changes in the FS. It's been a while since I've looked at inotify, but I think it would also introduce a Linux dependency; we could work around that by only using it where available, and falling back to today's behavior.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]