[Linux-cluster] Re: [Openais] Basic cluster not starting

Steven Dake sdake at redhat.com
Sun Jul 8 01:06:07 UTC 2007


James,

Let me speak with Patrick Caulfield on this topic Monday.

I have not seen this before in any of our testing, but it is possible
someone else using RHCS has.  I've also copied the linux-cluster list.

The problem appears to be, however, with something relating to ccs or
the startup order.  The opennais code doesn't know anything about the
ccsd node ids or parsing of the xml configuration file.  That work is
done by ccsd and cman.

Did you try the cman init script?

Regards
-steve

On Thu, 2007-07-05 at 14:21 -0400, james anderson wrote:
> I am attempting to install GFS on FC6 64bit using RPMs.
> Below you will find my config and steps taken to get a GFS cluster
> running.
> I am unclear if the problem is with OpenAIS or RHCS.
>  
>  
> FC6 64bit RPMs
> --------------
> rpm -ivh openais-0.80.1-3.x86_64.rpm
> rpm -ivh perl-Net-Telnet-3.03-5.noarch.rpm
> rpm -ivh cman-2.0.18-2.fc6.x86_64.rpm
> System config cluster
> rpm -ivh system-config-cluster-1.0.29-1.0.noarch.rpm
> Luci
> rpm -ivh python-imaging-1.1.6-3.fc6.x86_64.rpm
> rpm -ivh zope-2.9.7-2.fc6.x86_64.rpm
> rpm -ivh plone-2.5.3-1.fc6.x86_64.rpm
> rpm -ivh luci-0.9.3-2.fc6.x86_64.rpm
> Ricci
> rpm -ivh --nodeps oddjob-libs-0.27-8.x86_64.rpm
> rpm -ivh oddjob-0.27-8.x86_64.rpm
> rpm -ivh modcluster-0.9.3-2.fc6.x86_64.rpm
> rpm -ivh ricci-0.9.3-2.fc6.x86_64.rpm
>  
>  
> /etc/cluster/cluster.conf
> -------------------------
> <?xml version="1.0"?>
> <cluster alias="alpha_cluster" config_version="8"
> name="alpha_cluster">
>   <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>   <clusternodes>
>     <clusternode name="node1" nodeid="1" votes="1">
>       <multicast addr="239.192.196.121" interface="eth1"/>
>       <fence>
>         <method name="1">
>           <device name="nps1" port="1" switch="1"/>
>         </method>
>       </fence>
>   </clusternode>
>   <clusternode name="node2" nodeid="2" votes="1">
>     <multicast addr="239.192.196.121" interface="eth0"/>
>     <fence>
>       <method name="1">
>         <device name="nps1" port="2" switch="1"/>
>       </method>
>     </fence>
>   </clusternode>
>   <clusternode name="node3" nodeid="3" votes="1">
>   <multicast addr="239.192.196.121" interface="eth2"/>
>     <fence>
>       <method name="1">
>         <device name="nps1" port="3" switch="1"/>
>       </method>
>     </fence>
>   </clusternode>
> </clusternodes>
> <cman>
>   <multicast addr="239.192.196.121"/>
> </cman>
> <fencedevices>
>   <fencedevice agent="fence_apc" ipaddr="10.1.1.123" login="root"
> name="***" passwd="***"/>
>   </fencedevices>
>   <rm>
>     <failoverdomains/>
>     <resources/>
>   </rm>
> </cluster>
>  
>  
> Commands
> --------
> # modprobe lock_dlm
> # modprobe dlm
> # mount -t configfs non /sys/kernel/config
> # ccsd
> # cman_tool join
>  
>  
> /var/log/messages
> -----------------
> 1 Jul 2 14:50:16 node1 ccsd[22457]: Starting ccsd 2.0.18:
> 2 Jul 2 14:50:16 node1 ccsd[22457]: Built: Oct 1 2006 17:18:46
> 3 Jul 2 14:50:16 node1 ccsd[22457]: Copyright (C) Red Hat, Inc. 2004
> All rights reserved.
> 4 Jul 2 14:50:45 node1 ccsd[22457]: Unable to connect to cluster
> infrastructure after 30 seconds.
> 5 Jul 2 14:51:15 node1 ccsd[22457]: Unable to connect to cluster
> infrastructure after 60 seconds.
> 6 Jul 2 14:51:39 node1 ccsd[22457]: cluster.conf (cluster name =
> alpha_cluster, version = 6) found.
> 7 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive Service
> RELEASE 'subrev 1204 version 0.80.1'
> 8 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C) 2002-2006
> MontaVista Software, Inc and contributors.
> 9 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Copyright (C) 2006 Red
> Hat, Inc.
> 10 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] No nodeid specified in
> cluster.conf
> 11 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] Error reading CCS
> info, cannot start
> 12 Jul 2 14:51:41 node1 openais[22542]: [MAIN ]
> 13 Jul 2 14:51:41 node1 openais[22542]: [MAIN ] AIS Executive exiting
> (-9).
> 14 Jul 2 14:51:45 node1 ccsd[22457]: Unable to connect to cluster
> infrastructure after 90 seconds.
> 15 Jul 2 14:52:15 node1 ccsd[22457]: Unable to connect to cluster
> infrastructure after 120 seconds.
> 16 Jul 2 14:52:44 node1 ccsd[22457]: Stopping ccsd, SIGTERM received.
>  
> Lines 1-6 are from running the "ccsd" command above.
> Lines 7-13 are from running the "cman_tool join" command above.
>  
> I also received the following error message:
> cman not started: CCS does not have a nodeid for this node, run
> 'ccs_tool addnodeids' to fix
> cman_tool: aisexec daemon didn't start
>  
> Yes I did try running the ccs_tool addnodeids. It did not help. Notice
> in the cluster.conf the nodeids were already in place. Any pointers to
> narrowing down my problem are appreciated.
>  
> Thanks,
> James
>  
> 
> 
> ______________________________________________________________________
> See what you’re getting into…before you go there. Check it out!
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais




More information about the Linux-cluster mailing list