[Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related)

Fabio Massimo Di Nitto fabbione at ubuntu.com
Fri Nov 24 11:01:22 UTC 2006


Hi guys,

I found a corner case where calling fence_tools -w leave will/might hang.
in my setup where i have 2 nodes cluster:

- both nodes are up
- poweroff the first one -> OK
- reboot the second one -> OK
- the second node comes up again:

cman_tools services will show:
fence            0     default  00040001 JOIN_START_WAIT

since the first node is "dead" there is never a complete switch to state = none.

if you call fence_tools -w leave it will hang there forever.

in my init scripts I just changed the fence_stop() to use the usual wait 10
seconds or die kind of loop:

         fence_tool -w leave &
         for sec in $(seq 1 10); do
                 if pidof fence_tool &> /dev/null; then
                         if [ "$sec" = 10 ]; then
                                 kill $(pidof fence_tool) > /dev/null 2>&1
                         else
                                 sleep 1
                         fi
                 fi
         done

Regards
Fabio

PS I spotted this problem when updating the Ubuntu init scripts, but the code
used in upstream init script seems to suffer the exact same problem. You also
want to note that i am not checking for fenced to exit, but for the tools to return.

-- 
I'm going to make him an offer he can't refuse.




More information about the Cluster-devel mailing list