[Linux-cluster] post_fail_delay

riaan at obsidian.co.za riaan at obsidian.co.za
Tue Jun 6 13:23:01 UTC 2006


Having researched post_fail_delay in the archives extensively, I have the
following scenario and question:

I would like for an errant GFS node to be able to create network/disk dumps
before being power fenced. Am I missing something, or is this leaving the
errand node unfenced for any significant amount of time (enough to complete the
dump, assuming it is upwards of a few seconds) just a bad idea?
AFAIUnderstand, the whole idea of fencing is to prevent the node from damaging
the file system in the first place, making the collection of dumps and power
fencing fundamentally at odds with each other.

The only way I can see fencing/dumping being used togeather is with I/O fencing
(and I/O fencing alone, e.g. no power fencing as a second level).  The cluster
I/O fences the node immediately, but it remains up to be able to complete the
dump. Recovery entails rebooting & re-enabling the port (all manual). However,
post_fail_delay is still set to 0.

To summarize, as I see it (please feel free to correct)

To ensure data integrity:
- Always use a post_fail_delay of 0, whether you are using power or I/O
  fencing.
- When using power fencing (alone or with I/O fencing), you cannot use
netdump/diskdump - otherwise the server will be fenced (rebooted) before being
able to complete the dump.
- When you must have the ability to netdump/diskdump, use I/O fencing (and only
I/O fencing), and time the manual restore/unfence so that the dump has time to
complete

tnx
Riaan

----------------------------------------------------------------
This message was sent using Obsidian Online web-mail.
Obsidian Online - a division of Obsidian Systems (Pty) Ltd.
http://www.obsidianonline.net/




More information about the Linux-cluster mailing list