[Date Prev][Date Next] [Thread Prev][Thread Next]
Re: External Journal scenario - good idea?
- From: Jeremy Rumpf <jrumpf heavyload net>
- To: Vinnie <listacct1 lvwnet com>
- Cc: Ext3 Users List <ext3-users redhat com>
- Subject: Re: External Journal scenario - good idea?
- Date: Fri, 1 Nov 2002 17:14:52 -0400
> Yep now (I think) I understand. Since I have one large / filesystem,
> all writes go through the same "funnel". All writes have to use the
> same journal, going to the same "drive" (array). Since the same drives
> are involved writing to the shared dirs for SMB clients, as those which
> are involved with reads/writes to NFS mailbox dirs and other stuff, NFS
> requests and MySQL requests have to "get in line" with SMB requests when
> it's busy.
> Currently our complete usage of the single RAID5 array is right around
> 100GB. It is mostly file storage/backups from other hosts on the
> network. This will no doubt represent the largest file storage
> requirements of all the fileserver functions for this machine.
> In light of the smaller amount of space really needed for all of the
> other functions (combined), and the fact that for each 120GB drive we
> pull off the RAID5 array we will lose around 100GB of RAID5 storage
> capacity (though the drives would have to be removed from the array in
> PAIRS for each RAID1 array we were to create in this external 8-bay
> unit), it seems that the best usage of the external RAID enclosure and
> the 120GB drives we have in it, would be to create the other arrays
> elsewhere, and keep the large array for file storage. If I am to keep a
> RAID5 array going - I'm going to have to think about this some and
> decide if I can settle for something else, like a RAID0+1 array, or
> smaller RAID1 arrays.
> As you said, using a pair of 120GB drives for each RAID1 array used for
> other storage purposes (mailboxes, ftp, SQL database) would be a really
> big waste of space.
Yea, that's the biggest issue facing use of these new large capacity drives.
Figuring out how to keep performance acceptable while maximizing utilized
space. Of course, at least with IDE drives, the cost of large capacity drives
is still relatively low (compared to SCSI). I also suspect it will be
dropping off in the future even more so as Maxtor is readying a line of 320GB
> Also, I'm not so sure I would be gaining much advantage to make RAID1
> arrays in the same external unit, assuming I still had a RAID5 array in
> the same unit. That is, if what I am seeing has much or anything to do
> with the parity calculation speed of the RAID controller in this
> external subsystem. If it is swamped with XOR calculations while
> writing to a 7 drive array, it would probably not be much less swamped
> calculating parity data for a 4-5 drive array, and even a separate RAID1
> array working behind the same RAID controller may suffer write
> performance issues because the data has to be processed by the same RAID
> controller to actually get written to the RAID1 drives.
I would hope that the controller design is such that it can handle enough
operations per second to satisfy its host interface (the SCSI connection
exported to the host machine). Some things to consider if I may.
Remember every large write (assuming it spans multiple strips) means that the
1> Break up the data and issue a write to each drive to physically record the
data. This may be 5-7 write operations (1 for each drive).
2> Calculate the parity information and again generate a write to each drive
to physically record the parity data. Again, this may be 5-7 write operations
(1 for each drive).
That's a series of writes, followed by a parity calculation, follwed by
another series of writes.
For a RAID1 pair, the operations are more simple
1> Generate a write operation to the primary target drive
2> Generatte a write operation to the mirror drive
Across the board, this is less intensive than RAID5 operation. The offsetting
factor here would be that the controller has to support one RAID5 set while
it has to support multiple RAID1 sets. I would bet to say that the load on
the controller itself would stay about the same using 4 RAID1 sets vs 1 huge
RAID5 set. The number of write operations dispatched to the drives would be
about the same aggregaed over time and you'd be saving the overhead of
calculating the parity information.
> But I am really not even sure that what we're seeing here is a problem
> with the speed of the RAID controller. From some other reading I have
> done, it seems that grabbing up RAM to cache writes and combine it all
> into one big write is something that the 2.4 kernel series is rather
> notorious for. I saw an article/review of external RAID subsystems
> (both SCSI and ATA-to-SCSI type) which said the same thing - that
> Windows 2000 servers were a lot better at asynchronous I/O than kernel
> 2.4-based Linux, and proceeded to describe much of the same malady I
> have been seeing here. They did say that a lot of work is going into
> newer Linux kernels to make it better at async disk I/O.
Yes, the pagecache becomes a vaccumn cleaner during I/O intensive periods.
This is being looked into in the development series (cachiness tuning). One
thing maybe to try:
Specifically the section on tuning the I/O Elevator.
Also has some interesting notes in it.
One other thing, if your cache controller is set to run in "write back mode",
try disabling that. Write back caches on RAID controllers will
aggregate/delay writes as well. Might be worth looking into (I know it tends
to kill perfromance on my LSI MegaRaid at times).
> So on these (above), have them at least on separate partitions.
Yes, put these on the same disk set (as they aren't I/O intensive by far), but
keep them on separate partitions. You could then allocate the remainder of
that disk set to /usr/local an put one of the above tasks there, etc.
I also believe there's an added benefit to this as well that you're
overlooking :). If /, /var, /tmp, /usr, etc are all on the same filesystem,
one ounce of fs corruption hoses your whole machine. With them split up, /var
or /tmp can get whacked all to hell, but your machine will still boot :).
> Possibly the same drive, but at least separate partitions? (which would
> give them separate journals). And on the ones below:
> >and create special mounts for your samba, mysql, webroot (NFS), mail
> > (NFS), stuff.
> since this is where the majority of the real file activity is going on,
> put each of these on separate drives (or RAID1 arrays), so we not only
> have separate journals, but separate spindles too) ?
> Jeremy thank you so much for your reply. This has really given me a lot
> to chew on. And looking at my watch I see that it's Friday again..
> meaning I can actually work on this for a few days... <grin>.
Hope things go well,
[Date Prev][Date Next] [Thread Prev][Thread Next]