[SUMMARY] 2 Linux boxes, failover, & 1 EXT3 RAID


Many warm thank yous to Bill Rugolsky Jr. and Stephen Tweedie for their help on
this one.  Both pointed out that since the file system is journaled, if the 
primary box (nas1) were to crash, the secondary box should mount the ext3 file
system without any problems.  Depending on the nature of the journal (metadata
journaling and/or data journaling), we may have little or no data loss.

Bill Rigolsky, also pointed out that I may have some performance benefits from
data=journal option, since I am exporting the EXT3 filesystem with the 
"rw,sync,no_wdelay" options, thus forcing NFS to do synchronous commits.  His 
reasoning is based on a theory that with "data=ordered" and "sync" options for
EXT3 and NFS, the system will have to work harder to write out data blocks and
may need to "seek all over the disk to do so."  This will decrease throughput.
However, with "data=journal", the NFS forced syncs will write the data in a
(likely) contiguous journal (less disk seeking, less latency, increased 
throughput) and allow the kernel to do its actual disk commits on it's own

Best Regards,
Bill Antoniadis


Following is my original email:
I have two RedHat 7.2 (2.4.9-31) boxes that are attached to one external RAID
unit.  Both boxes are able to see the RAID unit as /dev/sdb1, but only
one box mounts (cat /proc/mounts yields: /dev/sdb1 /nas ext3 rw 0 0) the unit
at any give time.  The other box listens, via heartbeat (linux-ha), waiting to
mount the RAID unit, should it's sibling crash (actually, heartbeat no longer
heard via serial and ethernet).  The /nas directory is NFS exported with the
rw,sync,no_wdelay options to several Linux and Tru64 boxes.

What will I encounter should the primary (i.e. box currently mounting /dev/sdb1)
crash, and the backup take over?  From my simulations, I see the backup mount
/dev/sdb1 but I get the following in it's /var/log/messages:

nas2 kernel: kjournald starting.  Commit interval 5 seconds
nas2 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
nas2 kernel: EXT3 FS 2.4-0.9.11, 3 Oct 2001 on sd(8,17), internal journal
nas2 kernel: EXT3-fs: recovery complete.
nas2 kernel: EXT3-fs mounted filesystem with ordered data mode.

My limited understanding is that since both the primary box (named "nas1") and
the secondary (named "nas2") are keeping a metadata-only journal, that data
updates were flushed to disk (on nas1) and the metadata changes were not
committed, thus nas2 sees an inconsistent filesystem when mounting.  Am I

If we run with nas2 box for a while, and then decide to switch back to nas1,
how will nas1 and it's journal playback, react to the changes committed by nas2
since the crash?

Would it be safer to always run e2fsck on nas2 takeover, prior to mounting

Am I wrong in choosing EXT3 over EXT2 in this setup?

Any help is greatly appreciated.

Thanks in advance,
Bill Antoniadis

