[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] fence ' node1' failed if etho down



Hi,

first, thank you very much for your answer,

 You are right, I have not fencing devices at all, but for one reason: I havent!!!

I´m just testing with 2 xen virtual machines running on the same host and mounting an iscsi disk on other host to simulate shared storage.

on the other hand, I think I don´t understand the concept of fencing,

I try to configure fencing devices with luci, but when I try I don´t know what to select from the combo of fencing devices. (perphaps manual fencing, althoug its not recommended for production)

so, as I think this is a newbie and perhaps a silly question,

Can you give any good reference about fencing to learn about it or an example configuation with fence devices to see how it must be done

thanks again,

ESG


2009/2/17 spods iinet net au <spods iinet net au>
A couple of things.

You don't have any fencing devices defined in cluster.conf at all.  No power
fencing, no I/O fencing, not even manual fencing.

You need to define how each node of the cluster is to be fenced (forcibly removed
from the cluster) for proper failover operations to occur.

Secondly, if the only connection shared between the two nodes is the network cord
you just disconnected, then of course nothing will happen - each node has just
lost the only common connection between each other to control the faulty node
(i.e. through fencing).

There need's to be more connections in between the nodes of a cluster than just a
network card.  This can be achieved with a second NIC, I/O fencing, centralised
or individual power controls (I/O switches or IPMI).

That way in the event that the network connection is the single point of failure
between the two nodes, at least a node can be fenced if it's behaving improperly.

Once the faulty node is fenced, the remaining nodes should at that point continue
providing cluster services.

Regards,

Stewart




On Mon Feb 16 16:29 , ESGLinux  sent:

>Hello All,
>
>I have a cluster with two nodes running one service (mysql). The two nodes uses
a ISCSI disk with gfs on it.
>I haven´t configured fencing at all.
>
>I have tested diferent situtations of fail and these are my results:
>
>
>If I halt node1 the service relocates to node2 - OK
>if I kill the process in node1 the services relocate to node2 - OK
>
>but
>
>if I unplug the wire of the ether device or make ifdown eth0 on node1 all the
cluster fails. The service doesn´t relocate.
>
>In node2 I get the messages:
>
>Feb 15 13:29:34 localhost fenced[3405]: fencing node "192.168.1.188"
>Feb 15 13:29:34 localhost fenced[3405]: fence "192.168.1.188" failed
>Feb 15 13:29:39 localhost fenced[3405]: fencing node "192.168.1.188"
>
>Feb 15 13:29:39 localhost fenced[3405]: fence "192.168.1.188" failed
>
>again and again. The node2 never runs the service and I try to reboot the node1
the computer hangs waiting for stopping the services.
>
>
>In this situation all I can do is to switch off the power of node1 and reboot
the node2. This situation is not acceptable at all.
>
>I think the problem is just with fencing but I dont know how to apply to this
situation ( I have RTFM from redhat site  but I have seen how to apply it. :-( )
>
>
>this is my cluster.conf file
>
><cluster alias="MICLUSTER" config_version="62" name="MICLUSTER">
>        <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>
>        <clusternodes>
>                <clusternode name="node1" nodeid="1" votes="1">
>                        <fence/>
>                </clusternode>
>                <clusternode name="node2" nodeid="2" votes="1">
>
>                        <fence/>
>                </clusternode>
>        </clusternodes>
>        <cman expected_votes="1" two_node="1"/>
>        <fencedevices/>
>
>        <rm>
>                <failoverdomains>
>                        <failoverdomain name="DOMINIOFAIL" nofailback="0"
ordered="0" restricted="1">
>                                <failoverdomainnode name="node1" priority="1"/>
>
>                                <failoverdomainnode name="node2" priority="1"/>
>                        </failoverdomain>
>                </failoverdomains>
>                <resources/>
>
>                <service domain="DOMINIOFAIL" exclusive="0" name="BBDD"
revovery="restart">
>                        <mysql config_file="/etc/my.cnf" listen_address=""
mysql_options="" name="mydb" shutdown_wait="3"/>
>
>                        <ip address="192.168.1.183" monitor_link="1"/>
>                </service>
>        </rm>
></cluster>
>
>Any idea? references?
>
>Thanks in advance
>
>
>Greetings
>
>ESG
>
>
>
>




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]