[Linux-cluster] cman startup issue

gordan at bobich.net gordan at bobich.net
Wed Nov 7 14:39:50 UTC 2007


On Wed, 7 Nov 2007, Patrick Caulfield wrote:

>>>>>>>>> I'm having a weird problem. I am using a shared GFS root file
>>>>>>>>> system,
>>>>>>>>> and the same initrd image on all the machines. The cluster has 3
>>>>>>>>> machines on it at the moment, and 1 refuses to join the cluster,
>>>>>>>>> regardless of which order I bring them up in.
>>>>>>>>>
>>>>>>>>> When cman service is being started, it fails when starting cman:
>>>>>>>>>
>>>>>>>>> cman not started: Can't find local node name in cluster.conf
>>>>>>>>> /usr/local/sbin/cman_tool: aisexec daemon didn't start
>>>>>>>>>
>>>>>>>>> If I try to run aisexec, I get:
>>>>>>>>> aisexec: totemsrp.c:2867: memb_ring_id_store: Assertion `0' failed.
>>>>>>>>>
>>>>>>>>> Where should I be looking for causes of this? I double checked my
>>>>>>>>> cluster.conf and the MAC addresses, IP addresses and interface
>>>>>>>>> names are
>>>>>>>>> correct in each node's config.
>>>>>>>> Check that the new node can write into /tmp - where it is trying to
>>>>>>>> store the
>>>>>>>> current ring-id.  It could be SElinux or perhaps the permissions on
>>>>>>>> the file it
>>>>>>>> is trying to create.
>>>>>>> That fixed the aisexec problem, but the "Can't find local node name in
>>>>>>> cluster.conf" problem remains, and cman still won't start. :-(
>>>>>> Well, it won't start if it can' find the local node name in
>>>>>> cluster.conf ...
>>>>>> Have you double-checked that the name(s) in cluster.conf match those
>>>>>> on the
>>>>>> ethernet interfaces ?
>>>>> You mean as in:
>>>>> <eth name="eth1" mac="my:ma:ca:dd:re:ss" ip="10.1.2.3"
>>>>> mask="255.255.255.0"/>
>>>>> ?
>>>>>
>>>>> If so, then yes, I checked it about 10 times. That was the first thing I
>>>>> thought was wrong. :-(
>>>> As I don't have your cluster.conf or access to your DNS server it's
>>>> hard to say
>>>> from here, but that message does mean what it says. If you have older
>>>> software
>>>> it might not detect anything other than the node's main hostname, but
>>>> later
>>>> versions will check all the interfaces on the system for something
>>>> that matches
>>>> anything in cluster.conf.
>>> Well, the thing that really puzzles me is that the same cluster used to
>>> work before. All I effectively did was move it to a different IP range
>>> and changed cluster.conf. I can't figure out what could have changed in
>>> the meantime to break it, other than cluster.conf. The only other thing
>>> that's different is that some of the machines have eth1 and eth0
>>> reversed. Before they all used eth1 for cluster configuration, and now
>>> one of them uses eth0 (slightly different model, and the manufacturer
>>> mislaeled the ports on them). But I have two identical machines, and one
>>> connects, the other doesn't. It really has me stumped.
>>>
>>>> I see you're using eth1 so make sure you do have an up-to-date cman.
>>> I'm running the latest that is available for RHEL5.
>>
>> If that's what came with 5.0 then there's a bug in the name matching. I can't
>> figure out from the CVS tags in which package this was fixed unfortunately.
>>
>> "revision 1.26
>>  date: 2007/03/15 11:12:33;  author: pcaulfield;  state: Exp;  lines: +16 -13
>>  If the machine is multi-homed, then using a truncated name in uname but not in
>>  cluster.conf would fail to match them up."
>
> Well, I can tell you that the fix is NOT in cman-2.0.61, and it IS in
> cman-2.0.73. Sorry I can't be more specific!

Assuming that's what's causing my problem, it's not in 2.0.64, as that is 
what I have.

Is there a workaround? What triggers the bug? Can I make it go away by 
using different node names? Is it affected by DNS?

Gordan




More information about the Linux-cluster mailing list