Watch Performance

Mon Apr 17 15:27:34 UTC 2006

On Wed, 2006-04-12 at 17:15 -0400, Amy Griffis wrote:
> Steve Grubb wrote:     [Tue Apr 11 2006, 05:01:23PM EDT]
> > On Tuesday 11 April 2006 12:11, Amy Griffis wrote:
> > > -a exit,always -S chmod -S fchmod -S chown -S fchown -S lchown
> > > -S creat -S open -S truncate -S ftruncate -S mkdir -S rmdir -S unlink
> > > -S rename -S link -S symlink -F watch=/etc/sysconfig/console
> > >
> > > Now you don't have any rules for access(), so using it as the test
> > > case is much more interesting.
> > 
> > OK, I re-worked auditctl to use these syscalls instead of "all". I then re-ran 
> > the tests on the same kernel as I was testing on since lspp.17 has slab debug 
> > stuff turned on again.
> > 
> > rules  seconds    loss
> > 0        50            0%
> > 10      52            4%
> > 25      56            12%
> > 50      69            38%
> > 75      81            62%
> > 90      87            74%
> 
> Hmm, that's interesting, thanks.
> 
> > The 75 rule performance hit is now 62%. So there is some improvement in 
> > performance.  RHEL4 has a 6% hit for 90 rules.
> 
> Do you mean 10 rules + 80 watches?  Or 90 rules?
> 
> > We've narrowed the difference, but I don't consider this solved.
> 
> I think there are three syscall groups for which we want to consider
> the performance impact of having a large number of rules.
> 
> 1. syscalls for which there are no rules
> 
>     In most use cases, the majority of syscalls will fall into this
>     group.  That makes this group the most important because it likely
>     has the most effect on general system performance.  Based on your
>     numbers above, it looks like the impact here is unacceptable.
> 
>     For this group, there should be no filtering overhead at all.  To
>     achieve this, we must have some indication per-list whether there
>     are any rules for a given syscall.  If there are no rules for the
>     syscall, don't even walk the list.
> 
> 2. syscalls for which there are a small number of rules
> 
>     For this group we must walk the list, and some filtering overhead
>     is acceptable.  How much overhead is acceptable is a factor of how
>     many syscalls we would typically expect to be in this group, and
>     how often we would expect those syscalls to be used.
> 
>     If we need to optimize for this case, we have a couple of options.
> 
>         a) Provide features which reduce the number of rules for a
>            heavy offender.  E.g. for filesystem auditing,
> 
>            - allow multiple watches per rule, akin to multiple inodes
>              per rule
>            - allow a single watch on a directory to apply to many
>              files
> 
>         b) Separate the rules for a heavy offender, e.g. by putting
>            them in a separate list
> 
> 3. syscalls for which there are a large number of rules
> 
>     For this group, the filtering overhead is the most significant and
>     optimization is more difficult.  For some use cases, having a
>     rules tree instead of a rules list might help.
> 
>     For filesystem auditing, when you want to audit a large number of
>     inodes or watches, being able to audit an entire sub-tree with a
>     single rule would help that particular use case.  However, if you
>     want to audit specific inodes/watches that are spread throughout a
>     filesystem, the syscall-exit-based filtering is always going to
>     exact a penalty.  
> 
>     The only way I can think of to mitigate this is to hang the rule
>     data off of the inodes themselves, and receive a callback for each
>     filesystem event.  You can do the filtering from the callback, and
>     eliminate the list traversal on syscall exit.
>     
>     This is basically the RHEL4 implementation, and there are a few
>     reasons why we can't do it this way right now.  The first reason
>     is that inotify is doing something similar, and we need to attempt
>     to consolidate similar pieces of kernel functionality.  Inotify,
>     however, does not support all of the filesystem events we care
>     about auditing.  Additionally for those events that inotify does
>     support, its hooks are placed in such a way that events are not
>     produced for failed operations (which we care about).
> 
>     To use this type of implementation in audit, we must be able to
>     either significantly extend inotify, or justify our need for
>     having our own implementation to kernel.org.  I think the former
>     is more preferable than the latter.
> 
> > I also don't like the idea of handling this by all those syscalls
> 
> Yes, it makes the rules long and ugly.
> 
> Auditctl could support keywords on the command line that map to
> various groups of system calls.  That would be more user-friendly.
> 
> > or using "all" because user space tools could get out of sync with
> > the kernel. On any kernel upgrade, there could be a new syscall that
> > allows file system access.  The user space tools wouldn't know about
> > it and wouldn't provide automatic coverage.
> 
> True, we would have to keep an eye on new syscalls.
> 
> Hope this helps,
> Amy
> 

Hi,

Maybe this is a completely stupid thought, but what about the option of
adding a per-syscall filter list table, indexed by system-call number.
When the system-call occurs, say open(), we then:

1) Check if list_empty(filter_table.entry/exit[5]), if empty, no filter
rules for open(), exit.  

2) Otherwise, we walk the list... and we're only walking a list of
filter rules that apply to open().  

There's a space consumption penalty here... I dunno, I've been out of
the game for some time now.  Just a thought. 

-tim