[Linux-cluster] really reliable?

David Teigland teigland at redhat.com
Tue Apr 14 17:50:34 UTC 2009


On Tue, Apr 14, 2009 at 12:17:44PM -0400, Ryan Golhar wrote:
> Is redhat cluster suite really reliable?  I've been having so much 
> trouble getting a cluster up and running,

Problems getting a cluster up are common, usually come down to network issues,
and are very difficult to diagnose.  The cluster software produces almost
indecipherable errors and strange behaviors when the network isn't behaving as
expected.

My usual suggestion is to disable the cman init script, and just run
"ccsd; cman_tool join" on the nodes.  Then watch the output of
"cman_tool nodes", and "cman_tool status", observing how long it takes
the nodes to recognize each other.  Any delay over a few seconds for
a steady-state cluster membership to form, and you may have some network
problems.

To successfully administer a cluster, you really need to be proficient in
using cman_tool to start up, monitor and shut down the nodes.  The cman init
script does a bunch of things for you, which is great when everything is
working, but when something doesn't work the init script can leave a big
complicated mess that's impossible to sort out.


> I've installed just the bare minimum (before even getting to GFS) to 
> test the cluster software.  Just starting cman cluster services fails on 
> two of the nodes.

That's the right approach, but as mentioned above, you probably need to pare
things down to just using cman_tool if it's network problems at the root.

> Even when I try to reboot the nodes, I can't because the whole system 
> hangs on various processes that don't ever shut down.  I have to 
> physically reboot these boxes.

If something has gone wrong, it's often impossible to shutdown without a hard
reboot.  Even when things are working, rebooting can be a delicate task
because the system may easily be configured to stop things in the wrong order,
and one thing out of place can cause a wreck.

Dave




More information about the Linux-cluster mailing list