[Crash-utility] infinite loop in crash due to double-NMI on x86_64 system

Dave Anderson anderson at redhat.com
Fri Jun 25 19:31:58 UTC 2010


----- "Lucas Silacci" <Lucas.Silacci at teradata.com> wrote:

> Hi,
>  
> I've run into an issue where crash will enter an infinite loop while
> decoding exception stacks if those stacks get corrupted.
>  
> We've seen this on four different systems where the hardware generated
> multiple NMIs and the second and subsequent NMIs caused the NMI
> exception stack to be overwritten. When this condition is hit, the
> bottom rsp on the NMI exception stack (which would normally point you
> back to the kernel thread stack or possibly a different exception stack)
> points you back into the middle of the same NMI exception stack. This
> causes crash to infinitely loop when it tries to decode that exception
> stack.
>  
> Now clearly the root cause of the issue is faulty hardware that
> generated multiple NMIs. However a very small change in crash can detect
> this issue and stop the infinite loop from happening thereby allowing
> you to get to a point in crash where you can actually tell that it was
> an NMI that caused the system to dump.
>  
> The patch is attached to this email. For x86_64 it will detect the
> condition of any exception stack that points back at itself.
>  
> Please feel free to ask me any questions on this.

Wow, that's pretty interesting -- I've certainly never seen that before.
Can you show me what the backtrace looks like with your patch applied?

Thanks,
  Dave




More information about the Crash-utility mailing list