[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Hard lockups when writing a lot to GFS



I have a two-node setup on a dual-port SCSI SAN.  Note this is just
for test purposes.  Part of the SAN is a GFS filesystem shared between
the two nodes.

When we fetch content to the GFS filesystem via an rsync pull (well, several rsync pulls) on node 1, it runs for a while then node 1 hard
locks (nothing on the console, network dies, console dies, it's frozen
solid). Of course, node 2 notices it and marks node 1 down (/proc/cluster/nodes shows an "X" for node 1 under "Sts"). So the
cluster behaviour is OK. If I "fence-ack-manual -n node1" on node 2,
it runs along happily. I can reboot node 1 and everything returns to
normalcy.


The problem is, why is node 1 dying like this?  It is important that
this get sorted out as we have a LOT of data to synchronize (rsync is
just the test case--we'll probably use a different scheme on
deployment), and I suspect it's heavy write activity on that node
that's causing the crash.

Oh, both nodes have the GFS filesystem mounted with "-o rw,noatime".

Any ideas would be GREATLY appreciated!
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens vitalstream com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-      Do you know how to save five drowning lawyers?  No?  GOOD!    -
----------------------------------------------------------------------


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]