[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] Possible problem with cman init script in CVS HEAD (fence related)



Fabio Massimo Di Nitto wrote:
Hi guys,

I found a corner case where calling fence_tools -w leave will/might hang.
in my setup where i have 2 nodes cluster:

- both nodes are up
- poweroff the first one -> OK
- reboot the second one -> OK
- the second node comes up again:

cman_tools services will show:
fence            0     default  00040001 JOIN_START_WAIT

since the first node is "dead" there is never a complete switch to state = none.

if you call fence_tools -w leave it will hang there forever.

in my init scripts I just changed the fence_stop() to use the usual wait 10
seconds or die kind of loop:

         fence_tool -w leave &
         for sec in $(seq 1 10); do
                 if pidof fence_tool &> /dev/null; then
                         if [ "$sec" = 10 ]; then
                                 kill $(pidof fence_tool) > /dev/null 2>&1
                         else
                                 sleep 1
                         fi
                 fi
         done

Regards
Fabio

PS I spotted this problem when updating the Ubuntu init scripts, but the code
used in upstream init script seems to suffer the exact same problem. You also
want to note that i am not checking for fenced to exit, but for the tools to return.

Hi Fabio,

You should be able to do the same thing by specifying -t 10 for a ten-second timeout
on fence_tool.  For example:

fence_tool -t 10 -w leave

The default timeout value is five minutes, which means the hang shouldn't last
forever at any rate.

Regards,

Bob Peterson
Red Hat Cluster Suite


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]