[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] CLVM/GFS will not mount or communicate with cluster





On Sat, 9 Dec 2006, Robert Peterson wrote:

Barry Brimer wrote:
Everything was working fine for several months on this cluster. The cluster software is the latest provided by Red Hat for RHEL4. Latest kernel. I am using fence_ilo, and the working node fenced the problem node.

Same versions - RHEL 4 Red Hat Latest:

I've since discovered that another GFS cluster (non-production) had a similar issue, and a reboot on both nodes solved this problem. With the original (production) cluster, I am trying to figure out how to get the problem node back into the cluster without having to unmount the GFS volume from the remaining working node.

Thank you so much for your input, it is greatly appreciated.

If you have any more suggestions, particularly on how to get my problem node back into the cluster without unmounting the GFS volume from the working node, please let me know.

Thanks,
Barry
Hi Barry,

Hm. Can it be that your other nodes are still running the old cman in memory?
This might happen if you update the kernel code and cman code with up2date
to the latest, but haven't rebooted or loaded the new cluster kernel modules yet on the remaining nodes. That would also explain why a reboot solved the problem in the other cluster you wrote about. Perhaps you should do "uname -a" on all nodes and make sure they're all running the same kernel. If working node(s) and the rebooted node are both running the same kernel, then they will also be running the same cluster modules,
i.e., cman, in which case your problem might  be a new bug.

Even if they're not running the same kernel, the cman modules have a compatible protocol, unless the U1 version of cman is still running on the working node(s). If the working node is found to be running the old U1 cman, even if the new RPMs are installed, you may want to reboot in order to pick up the new kernel and cluster modules.

Bob,

Thank you for your responses. They are greatly appreciated. These systems never ran 4U1 as they were installed as current in June. I ended up having to get a maintenance downtime, as these are production systems. As expected, once both systems were rebooted, the cluster regained quorum with no problem and all services were established without issue. I was hoping that there might have been a way to regain cluster services without having to take the service down that runs on the GFS volume, but I couldn't find a way to do so. Thanks again for your help.

Barry


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]