[Linux-cluster] GFS on CentOS - cman unable to start

Wes Modes wmodes at ucsc.edu
Thu Jan 5 21:54:25 UTC 2012


Howdy, y'all. I'm trying to set up GFS in a cluster on CentOS systems
running on vmWare. The GFS FS is on a Dell Equilogic SAN.

I keep running into the same problem despite many differently-flavored
attempts to set up GFS. The problem comes when I try to start cman, the
cluster management software.

    [root at test01]# service cman start
    Starting cluster:
       Loading modules... done
       Mounting configfs... done
       Starting ccsd... done
       Starting cman... failed
    cman not started: Can't find local node name in cluster.conf
/usr/sbin/cman_tool: aisexec daemon didn't start
                                                               [FAILED]

    [root at test01]# tail /var/log/messages
    Jan  5 13:39:40 testbench06 ccsd[13194]: Unable to connect to
cluster infrastructure after 1193640 seconds.
    Jan  5 13:40:10 testbench06 ccsd[13194]: Unable to connect to
cluster infrastructure after 1193670 seconds.
    Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
Service RELEASE 'subrev 1887 version 0.80.6'
    Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright (C)
2002-2006 MontaVista Software, Inc and contributors.
    Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Copyright (C)
2006 Red Hat, Inc.
    Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
Service: started and ready to provide service.
    Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] local node name
"test01.gdao.ucsc.edu" not found in cluster.conf
    Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error reading CCS
info, cannot start
    Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] Error reading
config from CCS
    Jan  5 13:40:24 testbench06 openais[3939]: [MAIN ] AIS Executive
exiting (reason: could not read the main configuration file).

Here are details of my configuration:

    [root at test01]# rpm -qa | grep cman
    cman-2.0.115-85.el5_7.2

    [root at test01]# echo $HOSTNAME
    test01.gdao.ucsc.edu

    [root at test01]# hostname
    test01.gdao.ucsc.edu

    [root at test01]# cat /etc/hosts
    # Do not remove the following line, or various programs
    # that require network functionality will fail.
    128.114.31.112      test01 test01.gdao test01.gdao.ucsc.edu
    128.114.31.113      test02 test02.gdao test02.gdao.ucsc.edu
    127.0.0.1               localhost.localdomain localhost
    ::1             localhost6.localdomain6 localhost6

    [root at test01]# sestatus
    SELinux status:                 enabled
    SELinuxfs mount:                /selinux
    Current mode:                   permissive
    Mode from config file:          permissive
    Policy version:                 21
    Policy from config file:        targeted

    [root at test01]# cat /etc/cluster/cluster.conf
    <?xml version="1.0"?>
    <cluster config_version="25" name="gdao_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="120"/>
        <clusternodes>
            <clusternode name="test01" nodeid="1" votes="1">
                <fence>
                    <method name="single">
                        <device name="gfs_vmware"/>
                    </method>
                </fence>
            </clusternode>
            <clusternode name="test02" nodeid="2" votes="1">
                <fence>
                    <method name="single">
                        <device name="gfs_vmware"/>
                    </method>
                </fence>
            </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices>
            <fencedevice agent="fence_manual" name="gfs1_ipmi"/>
            <fencedevice agent="fence_vmware" name="gfs_vmware"
ipaddr="gdvcenter.ucsc.edu" login="root" passwd="1hateAmazon.com"
vmlogin="root" vmpasswd="esxpass"
port="/vmfs/volumes/49086551-c64fd83c-0401-001e0bcd6848/eagle1/gfs1.vmx"/>
        </fencedevices>
        <rm>
        <failoverdomains/>
        </rm>
    </cluster>

I've seen much discussion of this problem, but no definitive solutions. 
Any help you can provide will be welcome.

Wes Modes




More information about the Linux-cluster mailing list