[Linux-cluster] Cluster Shutdown - ideas?

Christine Caulfield ccaulfie at redhat.com
Tue Aug 12 10:50:33 UTC 2008


One thing that cman does rather badly is a full cluster shutdown. With 
the RHEL4 code you would shut each node down in turn using the init 
scripts and found that everything hung as it lost quorum when the N/2th 
node went down.

With RHEL5 the init script was changed to do a "cman_tool leave remove" 
which tells the remaining nodes to reduce quorum to allow for the 
missing node(s).

I don't really like either of these solutions. The RHEL4 way is 
obviously a nuisance, but even the RHEL5 system is wrong IMHO. A normal 
node shutdown should not reduce quorum. If other nodes fail while that 
node is down the cluster runs the risk of a split brain due to reduced 
quorum.

Those of you who have worked with VMS systems know that that OS has a 
CLUSTER_SHUTDOWN option which causes the cluster software to wait until 
all nodes have reached a shutdown barrier and then allows all of them to 
go down at the same time. We could do this with Linux, but I'm not 
really sure how much use it would be, mainly because the cluster 
software sits at a higher level in the OS than with VMS and there is a 
lot more for the computer to do after the cluster software has shut 
down. It is an option though.

The other option is simply to set a flag (either in CMAN or locally) to 
tell the node or the whole cluster that everyone is being shut down. 
There are a few ways of doing this, the simplest is to add a flag to the 
cman init script (basically the opposite of what happens now in RHEL5) 
that causes "cman_tool leave remove". But that requires the cluster 
software to be shut down independently of the rest of the software thus 
destroying the point of ordered init scripts.

So, the flag could be an environment variable that is checked by the 
init script perhaps (do those get passed through?), or perhaps a flag 
inside cman itself that changes the "leave" behaviour to either do a 
"leave remove" or the synchronised cluster shutdown I mentioned earlier.

Does anyone have any preferences, ideas or other options we might consider?

Chrissie




More information about the Linux-cluster mailing list