[Linux-cluster] FW: cluster crash problem

Andrea Laack alaack at ustrap.com
Wed Feb 20 13:54:01 UTC 2008


 

-----Original Message-----
From: Andrea Laack [mailto:alaack at ustrap.com] 
Sent: Tuesday, February 12, 2008 12:54 PM
To: 'linux-cluster at redhat.com'
Subject: cluster crash problem

We are running RHEL 3.0 with version 1.0.3 of RedHat cluster suite.  We are
utilizing a Promise Vtrak 15200 for shared storage and an Adaptec ASA-7211C
iSCSI initiator.
Having problems with any process that uses high I/O across the iSCSI link.

Last night dba attempted to create an Oracle instance on the shared storage
device.  The cluster crashed and failed over the the backup node.  Nothing
in the logs.  Log level set at 6.  Only indication I have that something
happened is from the graphs of the disk I/O (HotSanic).  This shows 69.42
Pentabytes (yes, it shows pentabytes).

We are using a watchdog timer.

This has happened before when copying *very* large amounts of data that
includes *very* large files.  Many small files does not cause the cluster to
crash.

Has anyone seen this type of problem?  Any help will be sincerely
appreciated.  Adaptec will only talk to me if I pay them $199/phone call.

Thanks
Andrea

Andrea Laack
Network Administrator
Universal Strap
W209N17500 Industrial Drive
Jackson, WI  53037
262-677-3641 Ext 5220


Thought I would answer my own questions in the hopes that it can be of use
to someone else.

Thanks to Lon Hohberger.  His information in an email I found in the mail
list archives sent me in the right direction.  

RedHat Knowledgebase gave the following suggestions.

A high network traffic or load can cause an cluster node to miss a
heartbeat.  This will cause the node to be fenced.  

The following performance tuning settings were suggested:

cludb -p clumembd%adaptive yes
cludb -p clumembd%interval 2000000
cludb -p clumembd%tko_count 15
cludb -p clumembd%rtp 10
cludb -p cluquorumd%rtp 10

(Be sure to stop cluster services before making these changes.)

I made the above changes and the problems were corrected.

Thanks
Andrea




More information about the Linux-cluster mailing list