[Linux-cluster] GFS on CentOS - cman unable to start

Mon Jan 9 11:08:25 UTC 2012

Hi,
 check /etc/sysconfig/cman maybe there is a different name present as
NODENAME ... remove the file (if present) or try to create one as:

#CMAN_CLUSTER_TIMEOUT=120
#CMAN_QUORUM_TIMEOUT=0
#CMAN_SHUTDOWN_TIMEOUT=60
FENCED_START_TIMEOUT=120
##FENCE_JOIN=no
#LOCK_FILE="/var/lock/subsys/cman"
CLUSTERNAME=ClusterName
NODENAME=NodeName

On Sun, 08 Jan 2012 20:03:18 -0800, Wes Modes <wmodes at ucsc.edu> wrote:
> The behavior of cman's resolving of cluster node names is less than
> clear, as per the RHEL bugzilla report.
> 
> The hostname and cluster.conf match, as does /etc/hosts and uname -n. 
> The short names and FQDN ping.  I believe all the node cluster.conf are
> in sync, and all nodes are accessible to each other using either short
> or long names.
> 
> You'll have to trust that I've tried everything obvious, and every
> possible combination of FQDN and short names in cluster.conf and
> hostname.  That said, it is totally possible I missed something obvious.
> 
> I suspect, there is something else going on and I don't know how to get
> at it.
> 
> Wes
> 
> 
> On 1/6/2012 6:06 PM, Kevin Stanton wrote:
>>
>> > Hi,
>>
>> > I think CMAN expect that the names of the cluster nodes be the same
>> returned by the command "uname -n".
>>
>> > For what you write your nodes hostnames are: test01.gdao.ucsc.edu
>> and test02.gdao.ucsc.edu, but in cluster.conf you have declared only
>> "test01" and "test02".
>>
>>  
>>
>> I haven't found this to be the case in the past.  I actually use a
>> separate short name to reference each node which is different than the
>> hostname of the server itself.  All I've ever had to do is make sure
>> it resolves correctly.  You can do this either in DNS and/or in
>> /etc/hosts.  I have found that it's a good idea to do both in case
>> your DNS server is a virtual machine and is not running for some
>> reason.  In that case with /etc/hosts you can still start cman.  
>>
>>  
>>
>> I would make sure whatever node names you use in the cluster.conf will
>> resolve when you try to ping it from all nodes in the cluster.  Also
>> make sure your cluster.conf is in sync between all nodes.
>>
>>  
>>
>> -Kevin
>>
>>  
>>
>>  
>>
>>
------------------------------------------------------------------------
>>
>>     These servers are currently on the same host, but may not be in
>>     the future.  They are in a vm cluster (though honestly, I'm not
>>     sure what this means yet).
>>
>>     SElinux is on, but disabled.
>>     Firewalling through iptables is turned off via
>>     system-config-securitylevel
>>
>>     There is no line currently in the cluster.conf that deals with
>>     multicasting.
>>
>>     Any other suggestions?
>>
>>     Wes
>>
>>     On 1/6/2012 12:05 PM, Luiz Gustavo Tonello wrote:
>>
>>     Hi,
>>
>>      
>>
>>     This servers is on VMware? At the same host?
>>
>>     SElinux is disable? iptables have something?
>>
>>      
>>
>>     In my environment I had a problem to start GFS2 with servers in
>>     differents hosts.
>>
>>     To clustering servers, was need migrate one server to the same
>>     host of the other, and restart this.
>>
>>      
>>
>>     I think, one of the problem was because the virtual switchs.
>>
>>     To solve, I changed a multicast IP, to use 225.0.0.13 at
>>     cluster.conf
>>
>>       <multicast addr="225.0.0.13"/>
>>
>>     And add a static route in both, to use default gateway.
>>
>>      
>>
>>     I don't know if it's correct, but this solve my problem.
>>
>>      
>>
>>     I hope that help you.
>>
>>      
>>
>>     Regards.
>>
>>      
>>
>>     On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes <wmodes at ucsc.edu
>>     <mailto:wmodes at ucsc.edu>> wrote:
>>
>>     Hi, Steven.
>>
>>     I've tried just about every possible combination of hostname and
>>     cluster.conf.
>>
>>     ping to test01 resolves to 128.114.31.112
>>     ping to test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>     resolves to 128.114.31.112
>>
>>     It feels like the right thing is being returned.  This feels like
it
>>     might be a quirk (or bug possibly) of cman or openais.
>>
>>     There are some old bug reports around this, for example
>>     https://bugzilla.redhat.com/show_bug.cgi?id=488565.  It sounds
>>     like the
>>     way that cman reports this error is anything but straightforward.
>>
>>     Is there anyone who has encountered this error and found a
solution?
>>
>>     Wes
>>
>>
>>
>>     On 1/6/2012 2:00 AM, Steven Whitehouse wrote:
>>     > Hi,
>>     >
>>     > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
>>     >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS
>>     systems
>>     >> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
>>     >>
>>     >> I keep running into the same problem despite many
>>     differently-flavored
>>     >> attempts to set up GFS. The problem comes when I try to start
>>     cman, the
>>     >> cluster management software.
>>     >>
>>     >>     [root at test01]# service cman start
>>     >>     Starting cluster:
>>     >>        Loading modules... done
>>     >>        Mounting configfs... done
>>     >>        Starting ccsd... done
>>     >>        Starting cman... failed
>>     >>     cman not started: Can't find local node name in cluster.conf
>>     >> /usr/sbin/cman_tool: aisexec daemon didn't start
>>     >>                                                              
>>      [FAILED]
>>     >>
>>     > This looks like what it says... whatever the node name is in
>>     > cluster.conf, it doesn't exist when the name is looked up, or
>>     possibly
>>     > it does exist, but is mapped to the loopback address (it needs to
>>     map to
>>     > an address which is valid cluster-wide)
>>     >
>>     > Since your config files look correct, the next thing to check is
>>     > what
>>     > the resolver is actually returning. Try (for example) a ping to
>>     test01
>>     > (you need to specify exactly the same form of the name as is used
>>     > in
>>     > cluster.conf) from test02 and see whether it uses the correct ip
>>     > address, just in case the wrong thing is being returned.
>>     >
>>     > Steve.
>>     >
>>     >>     [root at test01]# tail /var/log/messages
>>     >>     Jan  5 13:39:40 testbench06 ccsd[13194]: Unable to connect
to
>>     >> cluster infrastructure after 1193640 seconds.
>>     >>     Jan  5 13:40:10 testbench06 ccsd[13194]: Unable to connect
to
>>     >> cluster infrastructure after 1193670 seconds.
>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>     >>     Executive
>>     >> Service RELEASE 'subrev 1887 version 0.80.6'
>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
>>     >>     (C)
>>     >> 2002-2006 MontaVista Software, Inc and contributors.
>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright
>>     >>     (C)
>>     >> 2006 Red Hat, Inc.
>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>     >>     Executive
>>     >> Service: started and ready to provide service.
>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] local
>>     node name
>>     >> "test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>" not found
>>     in cluster.conf
>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>>     reading CCS
>>     >> info, cannot start
>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error
>>     >>     reading
>>     >> config from CCS
>>     >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS
>>     >>     Executive
>>     >> exiting (reason: could not read the main configuration file).
>>     >>
>>     >> Here are details of my configuration:
>>     >>
>>     >>     [root at test01]# rpm -qa | grep cman
>>     >>     cman-2.0.115-85.el5_7.2
>>     >>
>>     >>     [root at test01]# echo $HOSTNAME
>>     >>     test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>     >>
>>     >>     [root at test01]# hostname
>>     >>     test01.gdao.ucsc.edu <http://test01.gdao.ucsc.edu>
>>     >>
>>     >>     [root at test01]# cat /etc/hosts
>>     >>     # Do not remove the following line, or various programs
>>     >>     # that require network functionality will fail.
>>     >>     128.114.31.112      test01 test01.gdao test01.gdao.ucsc.edu
>>     <http://test01.gdao.ucsc.edu>
>>     >>     128.114.31.113      test02 test02.gdao test02.gdao.ucsc.edu
>>     <http://test02.gdao.ucsc.edu>
>>     >>     127.0.0.1               localhost.localdomain localhost
>>     >>     ::1             localhost6.localdomain6 localhost6
>>     >>
>>     >>     [root at test01]# sestatus
>>     >>     SELinux status:                 enabled
>>     >>     SELinuxfs mount:                /selinux
>>     >>     Current mode:                   permissive
>>     >>     Mode from config file:          permissive
>>     >>     Policy version:                 21
>>     >>     Policy from config file:        targeted
>>     >>
>>     >>     [root at test01]# cat /etc/cluster/cluster.conf
>>     >>     <?xml version="1.0"?>
>>     >>     <cluster config_version="25" name="gdao_cluster">
>>     >>         <fence_daemon post_fail_delay="0"
post_join_delay="120"/>
>>     >>         <clusternodes>
>>     >>             <clusternode name="test01" nodeid="1" votes="1">
>>     >>                 <fence>
>>     >>                     <method name="single">
>>     >>                         <device name="gfs_vmware"/>
>>     >>                     </method>
>>     >>                 </fence>
>>     >>             </clusternode>
>>     >>             <clusternode name="test02" nodeid="2" votes="1">
>>     >>                 <fence>
>>     >>                     <method name="single">
>>     >>                         <device name="gfs_vmware"/>
>>     >>                     </method>
>>     >>                 </fence>
>>     >>             </clusternode>
>>     >>         </clusternodes>
>>     >>         <cman/>
>>     >>         <fencedevices>
>>     >>             <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
>>     >>             <fencedevice agent="fence_vmware" name="gfs_vmware"
>>     >> ipaddr="gdvcenter.ucsc.edu <http://gdvcenter.ucsc.edu>"
>>     login="root" passwd="1hateAmazon.com"
>>     >> vmlogin="root" vmpasswd="esxpass"
>>     >>
>>    
port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
>>     >>         </fencedevices>
>>     >>         <rm>
>>     >>         <failoverdomains/>
>>     >>         </rm>
>>     >>     </cluster>
>>     >>
>>     >> I've seen much discussion of this problem, but no definitive
>>     solutions.
>>     >> Any help you can provide will be welcome.
>>     >>
>>     >> Wes Modes
>>     >>
>>     >> --
>>     >> Linux-cluster mailing list
>>     >> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>     >> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     >
>>     > --
>>     > Linux-cluster mailing list
>>     > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>     > https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>     --
>>     Linux-cluster mailing list
>>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>     https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>
>>      
>>
>>     -- 
>>     Luiz Gustavo P Tonello.
>>
>>
>>
>>     --
>>
>>     Linux-cluster mailing list
>>
>>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>
>>     https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>
>>     --
>>     Linux-cluster mailing list
>>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>>     https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>>  
>>
>>  
>>
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster