[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Kernel oops+crash on repeated auditd restarts



Please read below.

On Wed, Jan 25, 2012 at 9:20 PM, Eric Paris <eparis redhat com> wrote:
On Wed, 2012-01-25 at 18:45 +0200, Valentin Avram wrote:

> Did anybody ever experience kernel oopses and even kernel crashes
> (after a while), by just restarting repeatedly the auditd daemon?

No, but I'll try to remember to take a look.  We did have a BUG() that
was recently fixed when using -w rules (as I recall).   But I've never
seen this particular NULL pointer bug.  We did recently fix a race in
fsnotify mark destruction that could be this, but those symptoms weren't
exactly the same.

I'm both the upstream Audit and fsnotify maintainer so I'm grumbley at
Gentoo for never letting me know isn't working.  Where else did you
report this?  I'm wondering where all the information failure is
happening.

I only reported the issue on Gentoo bugs and LKML (the two links i included in the original email). The Gentoo guys at first did seem interested in the bug and asked for a test with a kernel compiled with CONFIG_DEBUG_INFO and CONFIG_DEBUG_LIST. After that test it looked like some list is getting messed up somewhere (altough i'm part C programmer, my kernel insides knowledge is limited). The LKML guys didn't even bother to answer.
 
Can you send me any and all info you have?


All the information i had is posted on the Gentoo bug report. The two machines i used to test the issue are now in production mode, so i can't do any testing on them. However I'll soon have access to a new machine that can stay in test mode for a while, where i plan to retest with Gentoo's latest "stable-marked" kernel gentoo-sources-3.1.6.

 
I'll see if I can reproduce a problem here (but I'm a Fedora guy)

At this moment i'm not extremely sure if it's a auditd issue or a kernel issue or both. However, if you're running a kernel lower than 3.0.7 and auditd 2.1.3, I'd be very interested if running the one-liner i posted (audit start and stop on a loop with 5 seconds delay) will eventually (in 1 hour or something close) crash the kernel completely (or at least oops a lot of times). 

Thank you.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]