[Linux-cluster] GFS on CentOS - cman unable to start

Fri Jan 6 20:05:23 UTC 2012

Hi,

This servers is on VMware? At the same host?
SElinux is disable? iptables have something?

In my environment I had a problem to start GFS2 with servers in differents
hosts.
To clustering servers, was need migrate one server to the same host of the
other, and restart this.

I think, one of the problem was because the virtual switchs.
To solve, I changed a multicast IP, to use 225.0.0.13 at cluster.conf
  <multicast addr="225.0.0.13"/>
And add a static route in both, to use default gateway.

I don't know if it's correct, but this solve my problem.

I hope that help you.

Regards.

On Fri, Jan 6, 2012 at 5:01 PM, Wes Modes <wmodes at ucsc.edu> wrote:

> Hi, Steven.
>
> I've tried just about every possible combination of hostname and
> cluster.conf.
>
> ping to test01 resolves to 128.114.31.112
> ping to test01.gdao.ucsc.edu resolves to 128.114.31.112
>
> It feels like the right thing is being returned.  This feels like it
> might be a quirk (or bug possibly) of cman or openais.
>
> There are some old bug reports around this, for example
> https://bugzilla.redhat.com/show_bug.cgi?id=488565.  It sounds like the
> way that cman reports this error is anything but straightforward.
>
> Is there anyone who has encountered this error and found a solution?
>
> Wes
>
>
> On 1/6/2012 2:00 AM, Steven Whitehouse wrote:
> > Hi,
> >
> > On Thu, 2012-01-05 at 13:54 -0800, Wes Modes wrote:
> >> Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS systems
> >> running on vmWare. The GFS FS is on a Dell Equilogic SAN.
> >>
> >> I keep running into the same problem despite many differently-flavored
> >> attempts to set up GFS. The problem comes when I try to start cman, the
> >> cluster management software.
> >>
> >>     [root at test01]# service cman start
> >>     Starting cluster:
> >>        Loading modules... done
> >>        Mounting configfs... done
> >>        Starting ccsd... done
> >>        Starting cman... failed
> >>     cman not started: Can't find local node name in cluster.conf
> >> /usr/sbin/cman_tool: aisexec daemon didn't start
> >>                                                                [FAILED]
> >>
> > This looks like what it says... whatever the node name is in
> > cluster.conf, it doesn't exist when the name is looked up, or possibly
> > it does exist, but is mapped to the loopback address (it needs to map to
> > an address which is valid cluster-wide)
> >
> > Since your config files look correct, the next thing to check is what
> > the resolver is actually returning. Try (for example) a ping to test01
> > (you need to specify exactly the same form of the name as is used in
> > cluster.conf) from test02 and see whether it uses the correct ip
> > address, just in case the wrong thing is being returned.
> >
> > Steve.
> >
> >>     [root at test01]# tail /var/log/messages
> >>     Jan  5 13:39:40 testbench06 ccsd[13194]: Unable to connect to
> >> cluster infrastructure after 1193640 seconds.
> >>     Jan  5 13:40:10 testbench06 ccsd[13194]: Unable to connect to
> >> cluster infrastructure after 1193670 seconds.
> >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
> >> Service RELEASE 'subrev 1887 version 0.80.6'
> >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright (C)
> >> 2002-2006 MontaVista Software, Inc and contributors.
> >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright (C)
> >> 2006 Red Hat, Inc.
> >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
> >> Service: started and ready to provide service.
> >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] local node name
> >> "test01.gdao.ucsc.edu" not found in cluster.conf
> >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error reading CCS
> >> info, cannot start
> >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error reading
> >> config from CCS
> >>     Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
> >> exiting (reason: could not read the main configuration file).
> >>
> >> Here are details of my configuration:
> >>
> >>     [root at test01]# rpm -qa | grep cman
> >>     cman-2.0.115-85.el5_7.2
> >>
> >>     [root at test01]# echo $HOSTNAME
> >>     test01.gdao.ucsc.edu
> >>
> >>     [root at test01]# hostname
> >>     test01.gdao.ucsc.edu
> >>
> >>     [root at test01]# cat /etc/hosts
> >>     # Do not remove the following line, or various programs
> >>     # that require network functionality will fail.
> >>     128.114.31.112      test01 test01.gdao test01.gdao.ucsc.edu
> >>     128.114.31.113      test02 test02.gdao test02.gdao.ucsc.edu
> >>     127.0.0.1               localhost.localdomain localhost
> >>     ::1             localhost6.localdomain6 localhost6
> >>
> >>     [root at test01]# sestatus
> >>     SELinux status:                 enabled
> >>     SELinuxfs mount:                /selinux
> >>     Current mode:                   permissive
> >>     Mode from config file:          permissive
> >>     Policy version:                 21
> >>     Policy from config file:        targeted
> >>
> >>     [root at test01]# cat /etc/cluster/cluster.conf
> >>     <?xml version="1.0"?>
> >>     <cluster config_version="25" name="gdao_cluster">
> >>         <fence_daemon post_fail_delay="0" post_join_delay="120"/>
> >>         <clusternodes>
> >>             <clusternode name="test01" nodeid="1" votes="1">
> >>                 <fence>
> >>                     <method name="single">
> >>                         <device name="gfs_vmware"/>
> >>                     </method>
> >>                 </fence>
> >>             </clusternode>
> >>             <clusternode name="test02" nodeid="2" votes="1">
> >>                 <fence>
> >>                     <method name="single">
> >>                         <device name="gfs_vmware"/>
> >>                     </method>
> >>                 </fence>
> >>             </clusternode>
> >>         </clusternodes>
> >>         <cman/>
> >>         <fencedevices>
> >>             <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
> >>             <fencedevice agent="fence_vmware" name="gfs_vmware"
> >> ipaddr="gdvcenter.ucsc.edu" login="root" passwd="1hateAmazon.com"
> >> vmlogin="root" vmpasswd="esxpass"
> >>
> port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
> >>         </fencedevices>
> >>         <rm>
> >>         <failoverdomains/>
> >>         </rm>
> >>     </cluster>
> >>
> >> I've seen much discussion of this problem, but no definitive solutions.
> >> Any help you can provide will be welcome.
> >>
> >> Wes Modes
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

-- 
Luiz Gustavo P Tonello.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120106/4c0c46de/attachment.htm>