[PATCH V1] audit: add warning that an old auditd may be starved out by a new auditd

Sun Sep 13 16:08:19 UTC 2015

On 15/09/11, Paul Moore wrote:
> On Fri, Sep 11, 2015 at 6:21 AM, Richard Guy Briggs <rgb at redhat.com> wrote:
> > On 15/09/09, Paul Moore wrote:
> >> On Monday, September 07, 2015 12:58:18 PM Richard Guy Briggs wrote:
> >> > On 15/09/07, Richard Guy Briggs wrote:
> >> > > Nothing prevents a new auditd starting up and replacing a valid
> >> > > audit_pid when an old auditd is still running, effectively starving out
> >> > > the old auditd since audit_pid no longer points to the old valid auditd.
> >> > >
> >> > > There isn't an easy way to detect if an old auditd is still running on
> >> > > the existing audit_pid other than attempting to send a message to see if
> >> > > it fails.  If no message to auditd has been attempted since auditd died
> >> > > unnaturally or got killed, audit_pid will still indicate it is alive.
> >> > >
> >> > > Signed-off-by: Richard Guy Briggs <rgb at redhat.com>
> >> >
> >> > Ok, self-nack on this one for a couple of problems...
> >> > netlink_getsockbyportid() is static to af_netlink.c and "pid" should be
> >> > task_tgid_vnr(current).  Otherwise, any opinions on this approach?
> >> >
> >> > > ---
> >> > > Note: Would it be too bold to actually block the registration of a new
> >> > > auditd if the netlink_getsockbyportid() call succeeded?  Would other
> >> > > checks be appropriate?
> >>
> >> Hmm.  It seems like we should prevent the registration of a new auditd if we
> >> already have an auditd instance connected, although as you say, that isn't the
> >> easiest thing to do.
> >
> > I wanted to do that, but I feared it would carry some risk that an
> > intentional act would be blocked.  In that case, an intentional act
> > could include explicitly killing the old auditd (or process registered
> > as such).
> 
> Well, if you are running another instance of auditd you should have
> the right level of access to kill an older, existing auditd instance
> so I'm not too worried about preventing an intentional act.  Also, if
> we check an existing connection with some sort of heartbeat/ping
> message we wouldn't be preventing it for more than a few times I
> believe.

Agreed.

> >> How painful would it be to return -EAGAIN to the new auditd while sending some
> >> sort of keep-alive/ping/etc. message to the old daemon to check its status?
> >
> > Well, if it turns out that the only reason it ever fails is
> > -ECONNREFUSED, then we just need to check with netlink_getsockbyportid()
> > to see if it fails before accepting the new auditd.
> 
> I'm not sure if the netdev crowd would be interested in exporting
> _getsockbyportid(); for the sake of discussion let's assume no changes
> to the netlink layer, if we get stuck we can revisit this idea.

Agreed.

> > If it is one of the others, can we put the new auditd task on a wait
> > queue until we hear back one way or the other or just timeout on
> > contacting the old auditd?
> 
> Well, what if we don't have anything queued for the old auditd?

Then it won't notice until something does get queued.  That would be the
purpose of sending a ping.

> Although I suppose if nothing else we could send a record indicating
> that another auditd attempted to replace it ... if we can send it
> great, drop the new request and be glad we audited it, if we can't
> send it, reset the auditd tracking.

This is actually a good idea.

> > I'll let Steve speak to dealing with -EAGAIN.  auditlib already deals
> > with -EAGAIN and -EINTR for some cases.  I have a patch that added
> > -ENOBUFS to those cases since I had seen some reports that -ENOBUFS had
> > been returned in some cases (don't remember the circumstances).
> 
> paul moore

- RGB

--
Richard Guy Briggs <rbriggs at redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545