[Linux-cluster] Re: manual fencing issue

Thu Oct 12 18:13:47 UTC 2006

On 10/12/06, Jonathan Biggar <jon at levanta.com> wrote:
> Eric Lemoine wrote:
> > Hi,
> >
> > I'm a new member to the linux-cluster mailing list.
> >
> > I'm trying to set up a 2-node cluster using RedHat Cluster based on
> > OpenAIS, with manual fencing (for now).
> >
> > When I reboot a node (l6-z-5), I get the following error messages in
> > /var/log/syslog of the other node (l6-z-12):
> >
> > Oct 12 16:38:02 l6-z-12 fenced[4508]: fencing node "l6-z-5"
> > Oct 12 16:38:02 l6-z-12 fenced[4508]: fence "l6-z-5" failed
> >
> > Probably because of that fencing failure, the service located on
> > l6-z-5 doesn't failover to l6-z-12. The fenced error messages repeat
> > until l6-z-5 rejoins the cluster.
> >
> > Does anyone know what's going on?
>
> Did you ever run fence_ack_manual to acknowledge the manual fencing request?

Just tried.

(1) reboot l6-z-5
(2) fenced error messages are repeatedly written to /var/log/syslog on l6-z-12
(3) from l6-z-12, ack manual fencing using "fence_ack_manual -n l6-z-5". Gives:

"Warning:  If the node "l6-z-5" has not been manually fenced
(i.e. power cycled or disconnected from shared storage devices)
the GFS file system may become corrupted and all its data
unrecoverable!  Please verify that the node shown above has
been reset or disconnected from storage.

Are you certain you want to continue? [yN] y
can't open /tmp/fence_manual.fifo: No such device or address"

(4) fenced error messages are still repeatedly written to /var/log/syslog.

So acknowledging the manual fence operation didn't help at all.
Actually I don't think fenced has called fence_manual, because there's
no syslog message indicating that fence_manual was called. And I get
such messages when I manually call fence_manual.

Thanks,

-- 
Eric