[Linux-cluster] fencing loop in a 2-node partitioned cluster

Marc Grimme grimme at atix.de
Tue Feb 24 19:09:44 UTC 2009


On Tuesday 24 February 2009 16:59:26 Gianluca Cecchi wrote:
> thanks, but where do I have to put the timeout?
> Inside fence seciotn of the nodes:
>                         <fence>
>                                 <method name="1">
>                                         <device name="ilonode01"/>
>                                 </method>
>                         </fence>
>
> or inside definition of fence devices:
>         <fencedevices>
>                 <fencedevice agent="fence_ilo" hostname="10.4.192.208"
> login="fenceuser" name="ilonode01" passwd="rhelclasi"/>
>                 <fencedevice agent="fence_ilo" hostname="10.4.192.209"
> login="fenceuser" name="ilonode02" passwd="rhelclasi"/>
>         </fencedevices>
>
> ?
> It is frustrating to be always impossible to check syntax and
> parameters for this cluster.conf mistery file... ;-(
This time you're lucky cause it's just a fenced option:

[root at generix2 ~]# fenced -h
Usage:

fenced [options]

Options:

  -c           All nodes are in a clean state to start
  -j <secs>     Post-join fencing delay (default 6)
  -f <secs>     Post-fail fencing delay (default 0)
  -O <path>    Override path (default /var/run/cluster/fenced_override)
  -D           Enable debugging code and don't fork
  -h           Print this help, then exit
  -V           Print program version information, then exit

Command line values override those in cluster.conf.
For an unbounded delay use <secs> value of -1.

And you don't want to change it for all nodes the same. So add this (-f ) 
option to the /etc/init.d/cman initscript in the function start_daemons to 
fenced. As there is no variable like FENCED_FAIL_DELAY you have to change the 
script ;( .

Marc.
>
> Thanks
> Gianluca
>
> On Tue, Feb 24, 2009 at 4:01 PM, Marc Grimme <grimme at atix.de> wrote:
> > We've solved this problem by using fence_timeouts that are dependent on
> > the nodeid. Means node0 gets timeout=0 and node1 gets timeout=10. Then
> > node0 will always survive. That's not the optimum way but works.
> > Or use qdiskd and let it detect the networkpartitioning (whereever it
> > happens) and decide which side should survive by a heuristic.
> > Marc.



-- 
Gruss / Regards,

Marc Grimme
Phone: +49-89 452 3538-14
http://www.atix.de/               http://www.open-sharedroot.org/

ATIX Informationstechnologie und Consulting AG | Einsteinstrasse 10 |
85716 Unterschleissheim | www.atix.de | www.open-sharedroot.org

Registergericht: Amtsgericht Muenchen, Registernummer: HRB 168930, USt.-Id.: 
DE209485962 | Vorstand: Marc Grimme, Mark Hlawatschek, Thomas Merz (Vors.) |
Vorsitzender des Aufsichtsrats: Dr. Martin Buss




More information about the Linux-cluster mailing list