[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Fencing?



On Mon, 2005-10-17 at 09:16 +0300, Omer Faruk Sen wrote:

> (maybe the node wasn't dead and will try to write
> something to shared storage which can cause catastrophic damage if GFS is
> not used) write something to file system.

Correct, except it causes catastrophic damage in any case, regardless of
whether or not GFS is used.  GFS requires fencing in order to operate.

> It does this using power
> switches or other methods such as IPMI or ILO .(I heard there was a new
> module for fencing that uses vmware )

GFS can use fabric-level fencing - that is, you can tell the iSCSI
server to cut a node off, or ask the fiber-channel switch to disable a
port.  This is in addition to "power-cycle" fencing.

> Thus I think this fencing conecpt is the same as STONITH in linux-ha.org
> which means Shoot The Other Node In The Head(Heart)....

STONITH, STOMITH, etc. are indeed implementations of I/O fencing.

Fencing is the act of forcefully preventing a node from being able to
access resources after that node has been evicted from the cluster in an
attempt to avoid corruption.

The canonical example of when it is needed is the live-hang scenario, as
you described:

1. node A hangs with I/Os pending to a shared file system
2. node B and node C decide that node A is dead and recover resources
allocated on node A (including the shared file system)
3. node A resumes normal operation
4. node A completes I/Os to shared file system

At this point, the shared file system is probably corrupt.  If you're
lucky, fsck will fix it -- if you're not, you'll need to restore from
backup.  I/O fencing (STONITH, or whatever we want to call it) prevents
the last step (step 4) from happening.

How fencing is done (power cycling via external switch, SCSI
reservations, FC zoning, integrated methods like IPMI, iLO, manual
intervention, etc.) is unimportant - so long as whatever method is used
can guarantee that step 4 can not complete.

-- Lon


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]