[augeas-devel] improving performance of aug_get() and aug_match() with large datasets

David Lutterkort lutter at watzmann.net
Mon Sep 28 22:08:35 UTC 2015


On Thu, Sep 24, 2015 at 2:27 AM, Dominic Cleal <dcleal at redhat.com> wrote:

> Yes, I've seen something similar before - it was reported to us in the
> context of a Puppet provider working on a huge file with many Nagios
> service definitions.  When lots of nodes with the same name, but
> different index (e.g. service[1], service[2]) exist then Augeas is
> extremely slow to traverse paths with a high index value.
>
> I spent a while profiling it and found a couple of very inefficient
> memory operations - here's my branch:
>
> https://github.com/hercules-team/augeas/compare/master...domcleal:ns-filter-perf3
>
> The problem is that it broke a couple of tests and I ran out of time
> before I could find the root cause.
>

I looked at these changes for a bit, and unfortunately, the change that
turns ns_add into something that blindly adds without checking for dupes
breaks a few tests in test-xpath, since things like preceding-sibling can
also produce the same node repeatedly. Clearly, as your test shows, the
O(n^2) behavior of adding nodes to a nodeset is a big problem. I can think
of a few ways to address that:

   - change nodeset so that it contains a hash table of the nodes to make
   checking for dupes fast(er)
   - since ns_add is only called in 4 places, and how we build nodesets is
   very local, make using it a little more complicated: add a flag 'added' to
   struct tree (pahole tells me we have a hole of 4 bytes in that data
   structure, i.e. plenty of space to store such a flag) and then have ns_add
   set that flag and check whether it is set when adding new nodes; make it
   mandatory to call ns_clear_added(ns) after being done with adding nodes to
   a nodeset, i.e. right after whatever loop that builds up the nodeset is done

I also tinkered with a different way to cut down the time that `ns_filter`
takes: just doing the `ns_remove` in batches (~ runs of non-matching nodes)
drastically cuts down the time `ns_filter` takes and doesn't introduce any
memory management headaches.

The patches in https://github.com/hercules-team/augeas/pull/303 change
ns_add as described above and change the internals of ns_filter.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/augeas-devel/attachments/20150928/c7f9b3bb/attachment.htm>


More information about the augeas-devel mailing list