[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] all nodes halt when one lose connection



Thanks for your answers, 

I have used a  separated network for the manage and service networks with 2 switchs and now it works fine. 

Thanks again, 

ESG

2009/5/28 Kaerka Phillips <kbphillips80 gmail com>
One thing we did not try, but might've worked, would be to bond two network interfaces together and then use vlan tagging on top of the bond interface to create a vlan across it to the other node, and then pointing the cluster to the vlan interfaces, which should still be up if even if the loss of one network interface or one switch.


On Wed, May 27, 2009 at 7:48 PM, Kaerka Phillips <kbphillips80 gmail com> wrote:
It sounds like they're fencing themselves.  We got around this issue on a two-node cluster by including the alternate node's internal ip address in the /etc/hosts file of both hosts and a cross-over cable for the service network with the private ip addresses assigned to that network.  If you're trying to get them to monitor each other via the public network, in theory this could be done with a backup fencing method, but we weren't able to get this work since the heartbeat functions only happen on the network that the node names are defined to use.


On Mon, May 25, 2009 at 5:28 AM, ESGLinux <esggrupos gmail com> wrote:
Hi, 

I think this is not my problem because fencing works fine. The nodes gets fenced inmediatly but I think they fence when they don't must 

Greetings, 

ESG

2009/5/22 jorge sanchez <xsanch gmail com>

Hi,

try also disable the acpi if is it running , see following:

http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5/html/Cluster_Administration/s1-acpi-CA.html


Regards,

Jorge Sanchez


On Thu, May 21, 2009 at 5:34 PM, ESGLinux <esggrupos gmail com> wrote:


2009/5/21 Jonathan Brassow <jbrassow redhat com>

On May 21, 2009, at 9:57 AM, ESGLinux wrote:

Hello,

these are the logs I get:

In node1:

May 21 11:33:44 NODE1 fenced[3840]: NODE2 not a cluster member after 5 sec post_fail_delay
May 21 11:33:44 NODE1 fenced[3840]: fencing node "NODE2"
May 21 11:33:44 NODE1 shutdown[5448]: shutting down for system halt

in node2:

May 21 11:33:45 NODE2 fenced[3843]: NODE1 not a cluster member after 5 sec post_fail_delay
May 21 11:33:45 NODE2 fenced[3843]: fencing node "NODE1"
May 21 11:33:45 NODE2 shutdown[5923]: shutting down for system halt


what I don´t know is way they lose the connection with the cluster, they are still connected (I only unplug a cable from the service network)

That may be something worth chasing down, as it appears that your cluster communication is on a network you don't expect?

How can I be sure about the network the nodes are using for communication? I think they do for the network I have configured to do that....
 

Also, are the nodes simply "shutting down", or are they being forcibly rebooted.  If it is a casual shutdown, then it would appear that both nodes are trying to shutdown simultaneously.

they simply shutdown. They no reboot.

This is what I get every time I unplug the nework cable from eth0 of any of the two nodes. (they communicate through eth1...)

Greetings,

ESG



 


--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]