[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [augeas-devel] improving performance of aug_get() and aug_match() with large datasets



On 09/22/2015 03:18 PM, Laine Stump wrote:
It was bound to happen eventually. Someone created a host with 514 vlan interfaces each connected to a host bridge, then started up virt-manager. [blah blah boring blah removed]
To update those not included in a separate thread on the topic in netcf-devel (I'll try to keep all discussion here from now on):

Dan Berrange pointed out that netcf was calling aug_load() on each entry to a public netcf API, and libvirt was calling netcf APIs multiple times for each interface. Even though aug_load() checks the mtime of files it has already loaded, and avoids re-loading those that haven't been modified (in this case none have been modified), it turns out that just doing a stat() of 1100 files takes a significant amount of time. So I modified netcf to only call aug_load() to do this check if it has been at least 1 second since the last time it was called. This made a very large improvement, especially when running the upstream versions of all involved packages (virt-manager --> libvirt --> netcf --> augeas). But when running the versions that are included in RHEL6, it wasn't so rosy. A test setup of 514 bridge+vlan interfaces which took around 30 minutes (!!) to complete a full startup of virt-manager (which calls netcf/augeas to list all interfaces, then get the XML config for them) now takes 13 minutes with netcf modified to call aug_load() only once per second. (the same operation takes "only" 8 minutes using all upstream code).

But 13 (or even 8) minutes is still a very long time, so I played around a bit in gdb and found that most of the time now seems to be spent in one call to aug_match():


r = aug_match(aug, path, "/files/etc/sysconfig/network-scripts/*[ DEVICE = 'br1' or BRIDGE = 'br1' or MASTER = 'br1' or MASTER = ../*[BRIDGE = 'br1']/DEVICE ]/DEVICE");

(this is the result of a call to netcf's aug_fmt_match() in the netcf function aug_get_xml_for_nif())

When I step over that call to aug_match(), there is a very noticeable pause before the gdb prompt comes back, while continuing from that point all the way through virt-manager's "get all interfaces" loop back to the next call to aug_get_xml_for_nif() (including several other calls to aug_match() that have much simpler search expressions) seems to happen instantly.

So apparently doing a match against all ifcfg files based on this complex match expression is really slowing us down. Any ideas on how to either make this expression simpler, or alternately how to get augeas doing the search more quickly?


I have two questions based on this:

1) has anyone thought about/looked into optimizing/changing the data structure used to store nodes in augeas to scale better with larger datasets (execution time seems to increase at > linear)?

2) I recall that a long time ago augeas put in code to re-read/parse files only if they had been modified. netcf (and thus libvirt) could take advantage of this info if it was available in the augeas API - the first time it retrieved the info for an interface it would take a hit, but all subsequent times could be much quicker.

About this one - I'm wondering how well it would work out for augeas to use inotify to learn about modifications to files (including the directory that the ifcfg files live in, in case a new file is created). It works okay for netcf to avoid calling aug_load() (as mentioned above), but it does make me a bit uncomfortable that we sometimes have a mistaken view of the config.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]