[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] self_fence for FS resource in RHEL 6.x operational?



On 01/22/2013 06:22 PM, Robert Hayden wrote:
> I am testing RHCS 6.3 and found that the self_fence option for a file
> system resource will now longer function as expected.  Before I log an
> SR with RH, I was wondering if the design changed between RHEL 5 and RHEL 6.
> 
> In RHEL 5, I see logic in /usr/share/cluster/fs.sh that will complete a
> "reboot -fn" command on a self_fence logic.  In RHEL 6, there is little
> to no logic around self_fence in the fs.sh file.

The logic has just been moved to a common file shared by all *fs
resources (fs-lib)



> 
> Example of RHEL 5 logic in fs.sh that appears to be removed from RHEL 6:
>         if [ -n "$umount_failed" ]; then
>                 ocf_log err "'umount $mp' failed, error=$ret_val"
> 
>                 if [ "$self_fence" ]; then
>                         ocf_log alert "umount failed - REBOOTING"
>                         sync
>                         reboot -fn
>                 fi
>                 return $FAIL
>         else
>                 return $SUCCESS
>         fi

same code, just different file.

> 
> 
> 
> To test in RHEL 6, I simply create a file system (e.g. /test/data)
> resource with self_fence="1" or self_fence="on" (as added by Conga). 
> Then mount a small ISO image on top of the file system.  This mount will
> cause the file system resource to be unable to unmount itself and should
> trigger a self_fence scenario.
> 
> Testing RHEL 6, I see the following in /var/log/messages:
> 
> Jan 21 16:40:59 techval16 rgmanager[82637]: [fs] unmounting /test/data
> Jan 21 16:40:59 techval16 rgmanager[82777]: [fs] Sending SIGTERM to
> processes on /test/data
> Jan 21 16:41:04 techval16 rgmanager[82859]: [fs] unmounting /test/data
> Jan 21 16:41:05 techval16 rgmanager[82900]: [fs] Sending SIGKILL to
> processes on /test/data
> Jan 21 16:41:05 techval16 rgmanager[61929]: stop on fs "share16_data"
> returned 1 (generic error)

Looks like a bug in force_umount option.

Please file a ticket with RH GSS.

As workaround try to disable force_umount.

As far as I can tell, but I haven't verify it:
ocf_log warning "Sending SIGKILL to processes on $mp"
                        fuser -kvm "$mp"

                        case $? in
                        0)
                                ;;
                        1)
                                return $OCF_ERR_GENERIC
                                ;;
                        2)
                                break
                                ;;
                        esac

the issue is the was fuser error is handled in force_umount path, that
would match the log you are posting.

I think the correct way would be to check if self_fence is enabled or
not and then return/reboot later on the script.

Fabio




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]