[Linux-cluster] CLVM/GFS will not mount or communicate with cluster

Sat Dec 9 17:01:38 UTC 2006

On Sat, 9 Dec 2006, Robert Peterson wrote:

> Barry Brimer wrote:
>> Everything was working fine for several months on this cluster.  The 
>> cluster software is the latest provided by Red Hat for RHEL4.  Latest 
>> kernel.  I am using fence_ilo, and the working node fenced the problem 
>> node.
>> 
>> Same versions - RHEL 4 Red Hat Latest:
>> 
>> I've since discovered that another GFS cluster (non-production) had a 
>> similar issue, and a reboot on both nodes solved this problem.  With the 
>> original (production) cluster, I am trying to figure out how to get the 
>> problem node back into the cluster without having to unmount the GFS volume 
>> from the remaining working node.
>> 
>> Thank you so much for your input, it is greatly appreciated.
>> 
>> If you have any more suggestions, particularly on how to get my problem 
>> node back into the cluster without unmounting the GFS volume from the 
>> working node, please let me know.
>> 
>> Thanks,
>> Barry
> Hi Barry,
>
> Hm.  Can it be that your other nodes are still running the old cman in 
> memory?
> This might happen if you update the kernel code and cman code with up2date
> to the latest, but haven't rebooted or loaded the new cluster kernel modules 
> yet on the
> remaining nodes.  That would also explain why a reboot solved the problem in 
> the other
> cluster you wrote about.  Perhaps you should do "uname -a" on all nodes and 
> make sure
> they're all running the same kernel.  If working node(s) and the rebooted 
> node are
> both running the same kernel, then they will also be running the same cluster 
> modules,
> i.e., cman, in which case your problem might  be a new bug.
>
> Even if they're not running the same kernel, the cman modules have a 
> compatible
> protocol, unless the U1 version of cman is still running on the working 
> node(s).
> If the working node is found to be running the old U1 cman, even if the new 
> RPMs are
> installed, you may want to reboot in order to pick up the new kernel and 
> cluster modules.

Bob,

Thank you for your responses.  They are greatly appreciated.  These 
systems never ran 4U1 as they were installed as current in June.  I ended 
up having to get a maintenance downtime, as these are production systems. 
As expected, once both systems were rebooted, the cluster regained quorum 
with no problem and all services were established without issue.  I was 
hoping that there might have been a way to regain cluster services without 
having to take the service down that runs on the GFS volume, but I 
couldn't find a way to do so.  Thanks again for your help.

Barry