Frequent RHEL Server crash/restarts
Barry Brimer
lists at brimer.org
Tue Jul 10 04:08:40 UTC 2007
On Mon, 9 Jul 2007, aix tiger wrote:
> Hi Friends
>
> I am facing a strange problem on one of my RHEL server which is that this server crashes and restart frequently. This is an HP proliant DL740 and part of RHEL cluster (V4U4).
>
> Another HP proliant DL740 is part of that cluster with same version of RHEL OS and cluster but it faces no such problems...
>
> In my /var/log/messages , i receive no errors .. in HP ILO messages there is no error mentioned except a message " A critical server error occured before this POST"...
>
> I have asked HP hardware engineer to check all hardware possible errors but he says that from diagnostics there are no issues.
>
> How can i troubleshoot this problem?? There is no specific timings of this problem , it happens any time ( usually once in aweek is a must )... please advice where to solve this issue?
I had a similar problem with an Oracle RAC cluster. One node rebooted,
one didn't. While I've not yet solved the problem, it is because the
condition stopped occurring. I set up netconsole (part of netdump) which
eventually told me that hangcheck-timer was rebooting my system. I also
am running hangwatch (http://people.redhat.com/csnook/hangwatch/) which
will run sysrq commands to capture the system state when the system load
spikes.
HTH,
Barry
More information about the redhat-list
mailing list