[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] [cluster-linux] rejoining cluster after being fenced



hello,

 

i actually run a 2 node RH5.1 cluster with openais 0.80.3-13 and cman 2.0.80-1

 

both nodes are hosted on VMware ESX3.02 servers, fencing works fine but here’s my issue :

 

whenever I simulate the failure of a node (shut Eth0 or hard reboot), the node is fenced but it can never rejoin the cluster again.

 

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] entering COMMIT state.

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] entering RECOVERY state.

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] position [0] member 10.148.46.50:

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] previous ring seq 7692 rep 10.148.46.50

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] aru c high delivered c received flag 1

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] position [1] member 10.148.46.51:

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] previous ring seq 7688 rep 10.148.46.51

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] aru b high delivered b received flag 1

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] Did not need to originate any messages in recovery.

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] Sending initial ORF token

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ] CLM CONFIGURATION CHANGE

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ] New Configuration:

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ]      r(0) ip(10.148.46.50) 

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ] Members Left:

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ] Members Joined:

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ] CLM CONFIGURATION CHANGE

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ] New Configuration:

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ]      r(0) ip(10.148.46.50) 

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ]      r(0) ip(10.148.46.51) 

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ] Members Left:

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ] Members Joined:

Mar 17 14:24:32 VMClutest01 openais[1941]: [CLM  ]      r(0) ip(10.148.46.51) 

Mar 17 14:24:32 VMClutest01 openais[1941]: [SYNC ] This node is within the primary component and will provide service.

Mar 17 14:24:32 VMClutest01 openais[1941]: [TOTEM] entering OPERATIONAL state.

Mar 17 14:24:32 VMClutest01 openais[1941]: [MAIN ] Killing node VMClutest02 because it has rejoined the cluster with existing state

 

 

is there anything to do after a failure in one node to make it rejoing the cluster in a « clean » state ?

If I try to cleanly restart note 2 with “shutdown –r now” it hangs on stopping cluster services

if I hard reboot node 2 it can never rejoin cluster and log is the same as above.

 

 

my cluster.conf

 

<?xml version="1.0"?>

<cluster alias="TestClu01" config_version="9" name="TestClu01"><fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="60"/>

<clusternodes>

<clusternode name="VMClutest01" nodeid="1" votes="1">

<fence><method name="FENCESX"><device name="ESX01"/></method>

</fence>

</clusternode>

<clusternode name="VMClutest02" nodeid="2" votes="1">

<fence><method name="FENCESX"><device name="ESX02"/></method>

</fence>

</clusternode>

</clusternodes>

<cman expected_votes="1" two_node="1"/>

<fencedevices>

<fencedevice name="ESX01" agent="fence_vi3" ipaddr="10.148.45.206" port="VMClutest01" login="" passwd=" "/>

<fencedevice name="ESX02" agent="fence_vi3" ipaddr="10.148.45.206" port="VMClutest02" login="" passwd=" "/>

</fencedevices>

 <rm>

<failoverdomains>

<failoverdomain name="AppCluster" ordered="0" restricted="0">

<failoverdomainnode name="VMClutest01" priority="1"/>

<failoverdomainnode name="VMClutest02" priority="1"/>

</failoverdomain>

</failoverdomains>

<resources>

<ip address="10.148.46.55" monitor_link="1"/>

</resources>

<service autostart="1" domain="AppCluster" exclusive="0" name="AppServer" recovery="restart">

<ip ref="10.148.46.55"/>

</service>

</rm>

<totem consensus="4800" join="1000" token="5000" token_retransmits_before_loss_const="20"/>

</cluster>

 

 

 

any idea ?

 

Mathieu

 

 

 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]