[Linux-cluster] GFS problem

Alex Urbanowicz alex.urbanowicz at gmail.com
Mon Jan 25 17:48:45 UTC 2010


Hello

I have a problem with shared GFS resource on a 12-node Cluster Manager
cluster.

The cluster starts up properly if all nodes are booted at once. Any major
interaction with one of the nodes (reboot, cman restart) causes the GFS to
lock out the GFS, and for the cluster to fal into some unstable split state.

In this state, logs, clustat and "cman_tool status" report the cluster as
fully connected and working, while "cman_tool resources" reports only the
fence resource in JOIN_START_WAIT (or JOIN_STOP WAIT, depending on what was
done to the cluster in the meantime) state with overlapping but different
node sets, depending on the node I run the "cman_tool resources" command.

So far, the only functioning method to get the cluster out of the state is
to manually reboot all the nodes at once, but this is unfeasible due to
uptime expectations and high load carried by the cluster.

We're completely in the dark about the possible cause of the problem, any
help is appreciated.

TIA

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100125/53aa33ab/attachment.htm>


More information about the Linux-cluster mailing list