[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] fencing problem in 2 node cluster using apc fence device


I'm currently configuring fencing devices for my 2 nodes on a RHEL4 cluster. The problem is quite long, so please bear with me.

I have 2 nodes (let's call them stone1 and stone2) and 2 APC fencing devices (pdu1 and pdu2, both apc 7952 devices). Both stone1 and stone2 has dual power supplies. Stone1's power supplies are connected to outlet 13 of pdu1 and pdu2. Stone2's power supplies are connected to outlet 20 of both the pdus. My question is: during the fencing configuration for each node, i need to specify which fence device to add to the fence level of each node. Is it correct to specify for stone1 as follows : pdu1 -> port=13, switch=1, pdu2-> port=13, switch=2? The same applies to stone 2 : pdu1-> port=20, switch=1, pdu2-> port=20, switch=2?

After configuring as mentioned above, with both nodes on the cluster running and my application running on stone1, i pull out the ethernet cables for stone1 to simulate that the server is down. By right, my application should fail over to stone2 and fencing should occur to stone1 (ie, stone1 should be rebooted/shutdown). However, what happened is that my application is started on stone2, and stone1 is not fenced. In fact, when i reconnect by cables, my application is still running on stone1! Seems that there are 2 instances of my application running, each on stone1 and stone2.

Why has the fencing failed? I've read somewhere that acpid service plays a part and i need to disable it. Is it true? When I check my /var/log/messages, I see a cman :sendmsg failed -101 error. What does this mean?

I've been trying to solve this problem for the last few days, but to no avail. Any advice will be appreciated.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]