[Linux-cluster] 32 nodes limit?


Just what I hope will be a quick question regarding the cluster suite.

The current lock manager FAQ states ...

"CMAN in RHEL4 has known problems when you have more than 32 nodes in the cluster. We're working to resolve those issues, but until then use GULM if you have more than 32 nodes."

... while the pre-Wiki version of this document refered to DLM instead of CMAN in RHEL4. Which one is it? DLM makes more sense to me.

In any case, I gather that this issue has been resolved. If so, can you tell me the minimum version of the cluster suite and/or upstream kernel that would allow for more than 32 nodes (with DLM)? A pointer to a patch or patches that I could use would be ideal.

More details ...

I'm trying to move a 5TB filespace from NFS to GFS2. I have a P4 (the current NFS server) and 33 Opteron nodes, all running a stock 2.6.22 kernel, OpenAIS 0.80.3, and a 2.00.00 cluster suite. For now, I've dummied out fencing and set expected_votes to 1. I can start/stop cman on all nodes no problem. With all cman's running, I've formatted, mounted and populated the filesystem using the P4. Proceeding through the Opterons to mount the filesystem succeeds until the 32nd node, at which point mount.gfs2 hangs (in "D" according to `ps ax`). Going back, the first 16 systems that have mounted the filesystem can still `ls` the top level directory, but attempts to do so on the remaining systems also get stuck in "D". Any attempt to unmount the filesystem throws the entire setup in "D".

Due to various considerations, moving to more recent versions is not the preferred option at this point. Hence my question.

Any ideas?



