[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] CLVM/GFS will not mount or communicate with cluster



Barry Brimer wrote:
This is a repeat of the post I made a few minutes ago.  I thought adding a
subject would be helpful.


I have a 2 node cluster for a shared GFS filesystem.  One of the nodes fenced
the other, and the node that got fenced is no longer able to communicate with
the cluster.

While booting the problem node, I receive the following error message:
Setting up Logical Volume Management:  Locking inactive: ignoring clustered
volume group vg00

I have compared /etc/lvm/lvm.conf files on both nodes.  They are identical.  The
disk (/dev/sda1) is listed when typing "fdisk -l"

There are no iptables firewalls active (although /etc/sysconfig/iptables exists,
iptables is chkconfig'd off).  I have written a simple iptables logging rule
(iptables -I INPUT -s <problem node> -j LOG) on the working node to verify that
packets are reaching the working node, but no messages are being logged in
/var/log/messages on the working node that acknowledge any cluster activity
from the problem node.

Both machines have the same RH packages installed and are mostly up to date,
they are missing the same packages, none of which involve the kernel, RHCS or
GFS.

When I boot the problem node, it successfully starts ccsd, but it fails after a
while on cman and fails after a while on fenced.  I have given the clvmd
process an hour, and it still will not start.

vgchange -ay on the problem node returns:

# vgchange -ay
  connect() failed on local socket: Connection refused
  Locking type 2 initialisation failed.

I have the contents of /var/log/messages on the working node and the problem
node at the time of the fence, if that would be helpful.

Any help is greatly appreciated.

Thanks,
Barry
Hi Barry,

Well, vgchange and other lvm functions won't work on the clustered volume
unless clvmd is running, and clvmd won't run properly until the node is talking happily through the cluster infrastructure. So as I see it, your problem is that cman is not starting properly. Unfortunately, you haven't told us much about
the system to determine why.  There can be many reasons.
For now, let me assume that the two were working properly in a cluster before it was fenced, and therefore I'll assume that the software and configurations are all okay. I think one reason this might happen is if you're using manual fencing
and haven't yet done your:

fence_ack_manual -n <fenced_node>

on the remaining node to acknowledge that the reboot actually happened.

Also, you might want to test communications between the boxes to make
sure they can communicate with each other in general. You might also get this kind of problem if you had updated the cluster software,
so that the cman on one node is incompatible with the cman on the other.
Ordinarily, there are no problems or incompatibilities with upgrading, but
if you upgraded cman from RHEL4U1 to RHEL4U4, for example, you might
get this because the cman protocol changed slightly between RHEL4U1 and U2.

Next time, it would also be helpful to post what version of the cluster software
you're running and possibly snippets from /var/log/messages showing why
cman is not connecting.

Regards,

Bob Peterson
Red Hat Cluster Suite


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]