[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: External Journal scenario - good idea?
- From: Vinnie <listacct1 lvwnet com>
- To: Ext3 Users List <ext3-users redhat com>
- Subject: Re: External Journal scenario - good idea?
- Date: Fri, 01 Nov 2002 15:31:50 -0500
Jeremy Rumpf wrote:
On Wednesday 30 October 2002 07:44 am, Vinnie wrote:Yep now (I think) I understand. Since I have one large / filesystem,
all writes go through the same "funnel". All writes have to use the
same journal, going to the same "drive" (array). Since the same drives
are involved writing to the shared dirs for SMB clients, as those which
are involved with reads/writes to NFS mailbox dirs and other stuff, NFS
requests and MySQL requests have to "get in line" with SMB requests when
Currently, the array is partitioned with a /boot partition, and a /
partition, each as ext3 with the default data=ordered journaling mode.
I have begun to realize gradually why it is a decent idea to break up
the filesystem into separate mount points and partitions, and may yet
end up doing that. But that's a rabbit to hunt another day, unless
taking care of this is also required to solve this problem.
This is _very_ adviseable.
But if these other requests (NFS mailboxes, MySQL, etc.) are on separate
spindles, drives which are not part of the RAID5 array, they are in a
different line waiting to be processed. This makes sense.
Currently our complete usage of the single RAID5 array is right around
100GB. It is mostly file storage/backups from other hosts on the
network. This will no doubt represent the largest file storage
requirements of all the fileserver functions for this machine.
This file server performs 5 key fileserver-related roles, due to its
having the large RAID5 file storage for the network:
1. Serves the mailboxes for our domain to the two frontend mail/web
servers via NFS mount
2. Runs the master SQL server - the two mail/web servers run local slave
copies of the mail account databases
3. Stores the master copy of web documents served by the web servers
(and will replicate them to web servers when documents change, still
working on this though)
4. Samba file server for storage needs on the network
5. Limited/restricted-access FTP server for web clients
Do any of these require more than 120GB of storage (meaning are they too large
to fit on a single 120GB RAID1 set)?
In light of the smaller amount of space really needed for all of the
other functions (combined), and the fact that for each 120GB drive we
pull off the RAID5 array we will lose around 100GB of RAID5 storage
capacity (though the drives would have to be removed from the array in
PAIRS for each RAID1 array we were to create in this external 8-bay
unit), it seems that the best usage of the external RAID enclosure and
the 120GB drives we have in it, would be to create the other arrays
elsewhere, and keep the large array for file storage. If I am to keep a
RAID5 array going - I'm going to have to think about this some and
decide if I can settle for something else, like a RAID0+1 array, or
smaller RAID1 arrays.
As you said, using a pair of 120GB drives for each RAID1 array used for
other storage purposes (mailboxes, ftp, SQL database) would be a really
big waste of space.
Also, I'm not so sure I would be gaining much advantage to make RAID1
arrays in the same external unit, assuming I still had a RAID5 array in
the same unit. That is, if what I am seeing has much or anything to do
with the parity calculation speed of the RAID controller in this
external subsystem. If it is swamped with XOR calculations while
writing to a 7 drive array, it would probably not be much less swamped
calculating parity data for a 4-5 drive array, and even a separate RAID1
array working behind the same RAID controller may suffer write
performance issues because the data has to be processed by the same RAID
controller to actually get written to the RAID1 drives.
But I am really not even sure that what we're seeing here is a problem
with the speed of the RAID controller. From some other reading I have
done, it seems that grabbing up RAM to cache writes and combine it all
into one big write is something that the 2.4 kernel series is rather
notorious for. I saw an article/review of external RAID subsystems
(both SCSI and ATA-to-SCSI type) which said the same thing - that
Windows 2000 servers were a lot better at asynchronous I/O than kernel
2.4-based Linux, and proceeded to describe much of the same malady I
have been seeing here. They did say that a lot of work is going into
newer Linux kernels to make it better at async disk I/O.
I did try building a 2.4.19 kernel this past weekend, and it crashed
MISERABLY during a large write test. Major SCSI driver error messages,
and it hung the SCSI bus to the point that I had to not only hit the
reset button on the server, but also cycle the power on the RAID unit,
before I could successfully RE-boot. I saw in the Changelogs for 2.4.19
that the Adaptec 78xx drivers have been revamped a couple times since
2.4.18. I guess I'm just going to have to stay with 2.4.18 for a while.
I have performed the recommended bdflush sysctl tweek to try to make the
kernel write dirty buffers more often, and while I am seeing a marked
increase in SCSI bus activity, write performance doesn't seem to have
improved a great deal. But from the "free" command (and this has always
been the deal), it's not the "buffers" RAM usage that is so high when
heavy disk write I/O is going on, its the "cached" RAM usage that hits
I am going to split up the single large filesystem into multiple mounts
as you suggested, as this much more clearly (thanks to your reply) is a
good idea. But I am concerned that even after doing this, since it is
the same kernel with its same "cache it first, then write it all at
once" semantics, that I may not be in much better shape.
It's really a shame to suspect so strongly that I would get the most
improved write performance out of this machine by dropping from 2GB of
RAM to 256MB. ;) Operating on the concept that if it has nowhere to
cache it, it HAS to write more often... ;)
I was considering the massive journal size for the samba share mount on
the idea that if the journal is big enough to be a "staging area" for
file copy operations from clients that may total out around 2GB or more
(possibly), that we could keep the journal commit activity largely an
asynchronous operation, rather than a chain of panic-mode synchronous
operations because we are straddling that 25-50 percent full trigger
until the data stops coming from the client machine.. But I'm not 100%
sure I understand how it all works just yet, I have to do some more
reading. It could actually be counter-productive to have such a large
Remember though, you can move the journal to an external device at any time. I
would heavily recommend that you break up your spindles and allocate the
journal with the filesystem (a large journal with the filesystem) to start
out with. Then if performance still demands it, grab some small(er) disks and
move the journals off to them.
When I say large journal, I usually think around the 250MB range. I personally
wouldn't recommend allocating a super large (greater than 1GB), but I'll
reside and let the FS experts advise on that issue.
So on these (above), have them at least on separate partitions.
Possibly the same drive, but at least separate partitions? (which would
give them separate journals). And on the ones below:
CAN WE CHANGE JOURNAL LOCATION ON EXISTING EXT3 PARTITIONS?
One other snag it seems we may run into is the fact that the / partition
already has a journal (/.journal, I presume), since it's already an ext3
partition. Is it possible to tell the system we want the journal
somewhere else instead? Strikes me that when we're ready to move to the
external journal, we may have to mount the / partition ext2, then remove
the journal, and create the new one and point the / partition to it with
the e2fs tools?
Yes, except I would _not_ advise moving the / partition journal to an external
device. The / partition should have very little activity (assuming /var or
/var/log is a separate file system). This is the prime reason you should not
be allocating one huge / filesystem. Break it up into something like:
since this is where the majority of the real file activity is going on,
put each of these on separate drives (or RAID1 arrays), so we not only
have separate journals, but separate spindles too) ?
and create special mounts for your samba, mysql, webroot (NFS), mail (NFS),
Jeremy thank you so much for your reply. This has really given me a lot
to chew on. And looking at my watch I see that it's Friday again..
meaning I can actually work on this for a few days... <grin>.
[Date Prev][Date Next] [Thread Prev][Thread Next]