[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] cman startup issue

On Wed, 7 Nov 2007, Patrick Caulfield wrote:

I'm having a weird problem. I am using a shared GFS root file system,
and the same initrd image on all the machines. The cluster has 3
machines on it at the moment, and 1 refuses to join the cluster,
regardless of which order I bring them up in.

When cman service is being started, it fails when starting cman:

cman not started: Can't find local node name in cluster.conf
/usr/local/sbin/cman_tool: aisexec daemon didn't start

If I try to run aisexec, I get:
aisexec: totemsrp.c:2867: memb_ring_id_store: Assertion `0' failed.

Where should I be looking for causes of this? I double checked my
cluster.conf and the MAC addresses, IP addresses and interface
names are
correct in each node's config.

Check that the new node can write into /tmp - where it is trying to
store the
current ring-id.  It could be SElinux or perhaps the permissions on
the file it
is trying to create.

That fixed the aisexec problem, but the "Can't find local node name in
cluster.conf" problem remains, and cman still won't start. :-(

Well, it won't start if it can' find the local node name in
cluster.conf ...
Have you double-checked that the name(s) in cluster.conf match those
on the
ethernet interfaces ?

You mean as in:
<eth name="eth1" mac="my:ma:ca:dd:re:ss" ip=""

If so, then yes, I checked it about 10 times. That was the first thing I
thought was wrong. :-(

As I don't have your cluster.conf or access to your DNS server it's hard to say
from here, but that message does mean what it says. If you have older software
it might not detect anything other than the node's main hostname, but later
versions will check all the interfaces on the system for something that matches
anything in cluster.conf.

Well, the thing that really puzzles me is that the same cluster used to work before. All I effectively did was move it to a different IP range and changed cluster.conf. I can't figure out what could have changed in the meantime to break it, other than cluster.conf. The only other thing that's different is that some of the machines have eth1 and eth0 reversed. Before they all used eth1 for cluster configuration, and now one of them uses eth0 (slightly different model, and the manufacturer mislaeled the ports on them). But I have two identical machines, and one connects, the other doesn't. It really has me stumped.

I see you're using eth1 so make sure you do have an up-to-date cman.

I'm running the latest that is available for RHEL5.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]