[PATCH][RFC] audit: set wait time to zero when audit failed

Wed Sep 18 12:23:16 UTC 2019

On Tue, Sep 17, 2019 at 9:07 PM Li,Rongqing <lirongqing at baidu.com> wrote:
> > -----邮件原件-----
> > 发件人: Paul Moore [mailto:paul at paul-moore.com]
> > 发送时间: 2019年9月18日 3:17
> > 收件人: Li,Rongqing <lirongqing at baidu.com>
> > 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
> >
> > On Mon, Sep 16, 2019 at 9:08 PM Li,Rongqing <lirongqing at baidu.com> wrote:
> > > > -----邮件原件-----
> > > > 发件人: Paul Moore [mailto:paul at paul-moore.com]
> > > > 发送时间: 2019年9月17日 6:52
> > > > 收件人: Li,Rongqing <lirongqing at baidu.com>
> > > > 抄送: Eric Paris <eparis at redhat.com>; linux-audit at redhat.com
> > > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed

...

> > > I just want to it as before 3197542482df ("audit: rework
> > > audit_log_start()"), wait 60 seconds once if
> > > auditd/readaheaad-collector have some problem to drain the audit backlog.
> >
> > The patch you mention fixed what was deemed to be buggy behavior; as
> > mentioned previously in this thread I see no good reason to go back to the old
> > behavior.
> >
> > > > If you are not using audit, you can always disable it via the kernel
> > > > command line, or at runtime (look at what Fedora does).
> > > >
> > > > > > You might also want to investigate what is generating some many
> > > > > > audit records prior to starting the audit daemon.
> > > > >
> > > > > It is /sbin/readahead-collector, in fact, we stop the auditd; We
> > > > > are doing a
> > > > reboot test, which rebooting machine continue to test hardware/software.
> > > > >
> > > > > it is same as below:
> > > > > auditctl -a always,exit -S all -F pid='xxx'
> > > > > kill -s 19 `pidof auditd`
> > > > >
> > > > > then the audited task will be hung
> > > >
> > > > So you are seeing this problem only when you run a test, or did you
> > > > provide this as a reproducer?
> > >
> > > auditctl -a always,exit -S all -F ppid=`pidof sshd` kill -s 19 `pidof
> > > auditd` ssh root at 127.0.0.1
> > >
> > > then ssh will be hung forever
> >
> > That is expected behavior.  You are putting a massive audit load on the system
> > by telling the kernel to audit every syscall that sshd makes, then you are
> > intentionally killing the audit daemon and attempting to ssh into the system.
> > The proper fix(es) here would be to 1) set reasonable audit rules and/or 2) use
> > an init system that monitors and restarts auditd when it fails (systemd has this
> > capability, I believe some others do as well).
>
> Both are not working.
> The auditd is not dead, it is in stop status(kill -s 19). So systemd/init will not restart it.
> Even if with little audit rules, after multiple accesses, the backlog will full due to no receiver

Fair point, however I still stand by my previous comments that there
are runtime configuration knobs which can mitigate this problem if it
is something you are concerned about.  Depending on the situation, you
can either increase the backlog to deal with transient problems, or
decrease the backlog wait time (possibly to zero) to prevent blocking
entirely.

-- 
paul moore
www.paul-moore.com