[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] lock_gulmd hanging on startup (STABLE, as of 24th running on Debian/Sarge)



Hi

I'm tryint to get gfs over gnbd running on Debian/Sarge.
ccsd is running fine (Using either IPv4, or IPv6), but lock_gulmd
hangs when it's started. I have enabled IPv6 in my kernel, but didn't
configure any IPv6 addresses. There are, howevery, link-local IPv6 addresses configures for each interface (Linux seems to add them automatically). I'm running lock_gulmd with the following options
"-n cluster-ws-sx --use_ccs --name master.ws-sx.cluster.solution-x.com -v ReallyAll".


Any tip & ideas? Any debugging I could do to trace that down?

This is what it syslogs:
Jun 27 05:15:22 elrond ccsd[795]: Starting ccsd DEVEL.1119711496:
Jun 27 05:15:22 elrond ccsd[795]: Built: Jun 25 2005 16:59:43
Jun 27 05:15:22 elrond ccsd[795]: Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Jun 27 05:15:22 elrond ccsd[795]: IP Protocol:: IPv6 only Multicast (default):: SET
Jun 27 05:15:28 elrond ccsd[795]: cluster.conf (cluster name = cluster-ws-sx, version = 1) found.
Jun 27 05:15:32 elrond lock_gulmd_main[814]: Forked lock_gulmd_core.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Starting lock_gulmd_core DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red Hat, Inc. All rights reser
ved.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I am running in Standard mode.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I am (master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:32 elrond lock_gulmd_core[826]: This is cluster cluster-ws-sx
Jun 27 05:15:32 elrond lock_gulmd_core[826]: In state: Pending
Jun 27 05:15:32 elrond lock_gulmd_core[826]: In state: Master
Jun 27 05:15:32 elrond lock_gulmd_core[826]: I see no Masters, So I am becoming the Master.
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Sending Quorum update to slave master.ws-sx.cluster.solution-x.com
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Could not send quorum update to slave master.ws-sx.cluster.solution-x.com
Jun 27 05:15:32 elrond lock_gulmd_core[826]: New generation of server state. (1119842132653336)
Jun 27 05:15:32 elrond lock_gulmd_core[826]: Got heartbeat from master.ws-sx.cluster.solution-x.com at 1119842132653434 (last:1119842132653434 max:0 avg:0)
Jun 27 05:15:33 elrond lock_gulmd_main[814]: Forked lock_gulmd_LT.
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: Starting lock_gulmd_LT DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red Hat, Inc. All rights reserved.


Jun 27 05:15:33 elrond lock_gulmd_LT[828]: I am running in Standard mode.
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: I am (master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:33 elrond lock_gulmd_LT[828]: This is cluster cluster-ws-sx
Jun 27 05:15:33 elrond lock_gulmd_LT000[828]: Locktable 0 started.
Jun 27 05:15:34 elrond lock_gulmd_main[814]: Forked lock_gulmd_LTPX.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: Starting lock_gulmd_LTPX DEVEL.1119711496. (built Jun 25 2005 17:00:28) Copyright (C) 2004 Red Hat, Inc. All rights reser
ved.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: I am running in Standard mode.
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: I am (master.ws-sx.cluster.solution-x.com) with ip (::ffff:10.100.20.1)
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: This is cluster cluster-ws-sx
Jun 27 05:15:34 elrond lock_gulmd_LTPX[831]: ltpx started.


ps auxwww | grep gulm gives:
root 826 0.0 0.1 2008 840 ? S<s 05:15 0:00 lock_gulmd_core --cluster_name cluster-ws-sx --servers ::ffff:10.100.20.1 --name master.ws-sx.cluster.solution-x.com --verbosity ReallyAll
root 828 0.0 0.1 2008 820 ? S<s 05:15 0:00 lock_gulmd_LT --cluster_name cluster-ws-sx --servers ::ffff:10.100.20.1 --name master.ws-sx.cluster.solution-x.com --verbosity ReallyAll
root 831 0.0 0.1 2008 820 ? S<s 05:15 0:00 lock_gulmd_LTPX --cluster_name cluster-ws-sx --servers ::ffff:10.100.20.1 --name master.ws-sx.cluster.solution-x.com --verbosity ReallyAll


And finally, strace shows all 3 pids stuck in a recv call on fd 6.

Here is my cluster.conf:
<cluster name="cluster-ws-sx" config_version="1">
<gulm>
<lockserver name="master.ws-sx.cluster.solution-x.com"/>
</gulm>
<clusternodes>
<clusternode name="master.ws-sx.cluster.solution-x.com">
<method name="single">
<device name="gnbd" nodename="master.ws-sx.cluster.solution-x.com"/>
</method>
</clusternode>


<clusternode name="s1.ws-sx.cluster.solution-x.com">
<method name="single">
<device name="gnbd" nodename="s1.ws-sx.cluster.solution-x.com"/>
</method>
</clusternode>
</clusternodes>


<fencedevices>
<fencedevice name="gnbd" agent="fence_gnbd" servers="10.100.20.1"/>
</fencedevices>
</cluster>


greetings, Florian Pflug


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]