[Date Prev][Date Next] [Thread Prev][Thread Next]
External Journal scenario - good idea?
- From: Vinnie <listacct1 lvwnet com>
- To: Ext3 Users List <ext3-users redhat com>
- Subject: External Journal scenario - good idea?
- Date: Wed, 30 Oct 2002 07:44:28 -0500
I've just recently joined the ext3-users list. I spent much of the
weekend browsing over list archives and other tidbits I could find on
the net, regarding using an external journal, and running in
data=journal mode. From what I have seen looking around at what other
folks are doing, data=journal with an external journal may be able to
help our problem here.
If I could pick the brains of the resident gurus for a moment, and
solicit some advice, I thank everyone in advance who can take the time
to offer their opinions.
We are running a file server, which currently has as its "hard drive" an
ATA-to-SCSI external RAID subsystem. The file server is a dual
Pentium-III Tualatin 1.4GHz (512K cache) server, built on a Serverworks
HESL-T chipset, with 2GB ECC Registered SDRAM.
The RAID unit is a Promise UltraTrak100-TX8, with 8 Western Digital
WD1200JB 120GB ATA100 7200rpm hard drives installed. 7 of the 8 drives
are joined to a RAID5 array, the 8th is an unassigned hot spare. The
UltraTrak's SCSI interface is an Ultra2-LVD (80MB/sec) interface,
connected via its external 68-pin MicroD cable, to a custom Granite
Digital internal-to-external "Gold TPO" ribbon cable - which leads to
the "B" channel of the onboard AIC7899W Ultra160 SCSI interface. The
RAID unit is the only SCSI device attached to this channel at this time,
and is terminated with a Granite Digital SCSI-Vue active diagnostic
terminator. I have no indication or suspicion whatsoever of any SCSI
bus problems. (I have also run same UltraTrak unit with same diag
terminator to an AHA2940U2W in the "old" file server, with same write
performance issues, to be described below).
Currently, the array is partitioned with a /boot partition, and a /
partition, each as ext3 with the default data=ordered journaling mode.
I have begun to realize gradually why it is a decent idea to break up
the filesystem into separate mount points and partitions, and may yet
end up doing that. But that's a rabbit to hunt another day, unless
taking care of this is also required to solve this problem.
This file server performs 5 key fileserver-related roles, due to its
having the large RAID5 file storage for the network:
1. Serves the mailboxes for our domain to the two frontend mail/web
servers via NFS mount
2. Runs the master SQL server - the two mail/web servers run local slave
copies of the mail account databases
3. Stores the master copy of web documents served by the web servers
(and will replicate them to web servers when documents change, still
working on this though)
4. Samba file server for storage needs on the network
5. Limited/restricted-access FTP server for web clients
For the most part, the file server runs great and does its job quite
well. However there are two main circumstances in which things run
quite poorly to "go downhill":
1. Daily maintenance-type cron events (like updatedb)
2. Other heavy file WRITE activity, such as when Samba clients are
backing up their files to this server from the network. We regularly
have some very large files being copied over to the file server via
Samba (1 GB drive image files, for example)
In both cases, or other cases of heavy file I/O (mainly writes), this
server pretty much grinds to a halt. It starts grabbing up all of the
available RAM to use as dcache, presumably because the RAID unit cannot
write it to disk that fast. The inevitable is stalled as long as
possible, but eventually the backlog uses up all available system RAM
(we have 2GB in this puppy now), until it is forced to write
synchronously to free up some dcache for fresh data coming in. While
this is going on, might as well forget delivering/retrieving an email
to/from mailboxes, or getting much anything else out of the server. We
have seen "NFS Server Not Responding" errors, and MySQL errors too (from
the vpopmail libs trying to look up the username/pw and mailbox location).
Once the "emergency/panic" sync writing to disk is complete, the server
reverts back to running great (although linux never seems to de-allocate
RAM it has grabbed for dcache -- that is until it absolutely HAS to give
From what I've been reading this seems to be normal for 2.4-series
kernels (I'm running a modded 2.4.18 on this server, patched with the
various NFS suite of patches, plus recent iptables), it seems to really
like to use RAM for cache. And I suppose that RAM works better doing
SOMETHING, than just sitting there looking pretty under the available
I also understand that RAID5 is not known for its great writing
performance. Add to that running an ext3 filesystem, which does add
some overhead to it for the extra work.
We really need to solve this problem. We're also seeing "NFS Server not
responding" errors in the logs every day during maintenance runs, and
pretty much any other time heavy disk activity is going on, so mail
performance is being affected. Mail users get username/pw errors (it
even tells them it couldn't contact MySQL update server sometimes).
It's definitely not a server horsepower problem. ;) But I can see where
it could be a write speed issue with the RAID unit. Unless this is just
the way the linux kernel does things (which I am afraid may be the case).
THOUGHTS ABOUT USING AN EXTERNAL DATA=JOURNAL SETUP
After reading many posts in the archives here and other things I could
find, I have considered setting up a separate pair of quick drives in a
RAID1 array as an external journal, and setting DATA=JOURNAL mode on the
root filesystem mount.
This strikes me as a possible write performance improver, if doing so
will allow the larger writes to be "satisfied" faster because they only
have to be written to the journal drive pair, without all the overhead
of having to write to the RAID5 array. I realize that the data still
has to be written to the main filesystem on the RAID5 array, and that
this will actually cause more work. I'm just wondering if the journal
updating to the actual filesystem is more of a background thing which
does not affect the responsiveness of the file server. We would
probably make the journal size close to the full size of the RAID1 array
Does this seem like a viable option to improve or eliminate the server
responsiveness problems? Or do any of the gurus out there have any
better suggestions? We can't fit an NVRAM-based external journal device
in the budget.
CAN WE CHANGE JOURNAL LOCATION ON EXISTING EXT3 PARTITIONS?
One other snag it seems we may run into is the fact that the / partition
already has a journal (/.journal, I presume), since it's already an ext3
partition. Is it possible to tell the system we want the journal
somewhere else instead? Strikes me that when we're ready to move to the
external journal, we may have to mount the / partition ext2, then remove
the journal, and create the new one and point the / partition to it with
the e2fs tools?
Thanks in advance for all thoughts, opinions, and suggestions. I'll
provide whatever other details necessary.
Thanks in advance,
[Date Prev][Date Next] [Thread Prev][Thread Next]