[Linux-cluster] CLVM/GFS will not mount or communicate with cluster
Barry Brimer
lists at brimer.org
Sat Dec 9 17:01:38 UTC 2006
On Sat, 9 Dec 2006, Robert Peterson wrote:
> Barry Brimer wrote:
>> Everything was working fine for several months on this cluster. The
>> cluster software is the latest provided by Red Hat for RHEL4. Latest
>> kernel. I am using fence_ilo, and the working node fenced the problem
>> node.
>>
>> Same versions - RHEL 4 Red Hat Latest:
>>
>> I've since discovered that another GFS cluster (non-production) had a
>> similar issue, and a reboot on both nodes solved this problem. With the
>> original (production) cluster, I am trying to figure out how to get the
>> problem node back into the cluster without having to unmount the GFS volume
>> from the remaining working node.
>>
>> Thank you so much for your input, it is greatly appreciated.
>>
>> If you have any more suggestions, particularly on how to get my problem
>> node back into the cluster without unmounting the GFS volume from the
>> working node, please let me know.
>>
>> Thanks,
>> Barry
> Hi Barry,
>
> Hm. Can it be that your other nodes are still running the old cman in
> memory?
> This might happen if you update the kernel code and cman code with up2date
> to the latest, but haven't rebooted or loaded the new cluster kernel modules
> yet on the
> remaining nodes. That would also explain why a reboot solved the problem in
> the other
> cluster you wrote about. Perhaps you should do "uname -a" on all nodes and
> make sure
> they're all running the same kernel. If working node(s) and the rebooted
> node are
> both running the same kernel, then they will also be running the same cluster
> modules,
> i.e., cman, in which case your problem might be a new bug.
>
> Even if they're not running the same kernel, the cman modules have a
> compatible
> protocol, unless the U1 version of cman is still running on the working
> node(s).
> If the working node is found to be running the old U1 cman, even if the new
> RPMs are
> installed, you may want to reboot in order to pick up the new kernel and
> cluster modules.
Bob,
Thank you for your responses. They are greatly appreciated. These
systems never ran 4U1 as they were installed as current in June. I ended
up having to get a maintenance downtime, as these are production systems.
As expected, once both systems were rebooted, the cluster regained quorum
with no problem and all services were established without issue. I was
hoping that there might have been a way to regain cluster services without
having to take the service down that runs on the GFS volume, but I
couldn't find a way to do so. Thanks again for your help.
Barry
More information about the Linux-cluster
mailing list