[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
[Linux-cluster] How to "reactivate" a fenced node?
- From: Sebastian Kayser <mls skayser de>
- To: linux-cluster redhat com
- Subject: [Linux-cluster] How to "reactivate" a fenced node?
- Date: Wed, 8 Jun 2005 23:47:06 +0200
Hi all,
i have got a 3 node gfs lab setup on debian sarge plus vanilla 2.6.11
kernel up and running with the FC4 CVS branch code from
http://people.redhat.com/teigland/cluster-2.6.11.tar.bz2
Two of my nodes are connected via FC (sarge-fc1, sarge-fc2) and the
other one via iscsi (iscsi).
If i try to simulate a node failure on one of the FC-nodes by unplugging
its network connection, the node gets fenced (fence_sanbox2) and the
other two nodes keep on going. On the now fenced node i see a lot of
I/O errors (quite evident, the node is fenced), shortly after that
the node becomes inquorate.
Now i would like to reactivate the fenced node by
- Stopping the processes with access on the shared gfs volume
- Umount the shared gfs volume
- Stopping cluster daemons
- Re-enable the FC ports
- Starting cluster daemons (joining the cluster)
- Mount the shared gfs volume again
- Starting what needs to be started
However all processes on the fenced node with access on the gfs volume
are blocked in a way i can't stop them (even with a SIGKILL), so i can't
umount the still "busy" gfs volume, and so i can't stop the cluster
daemons. All i am left with to regain access to the gfs volume is to
reboot the fenced node.
The last message that gets written to syslog on the fenced node is
Jun 8 21:29:05 sarge-fc2 kernel: GFS: fsid=cluster:gfs1.1: telling LM
to withdraw
but that doesn't seem to have any effect. I also tried a 'gfs_tool
withdraw' to no avail.
Is this behaviour by design (i.e. unkillable processes)? Is it possible
to avoid rebooting the node in order to regain gfs access?
Regards,
Sebastian
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]