[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: EXT2 vs. EXT3: mount w/sync or fdatasync



Final configuration and performance results.

Changed machines (for a RAID test):
- 3ware 9550SX with BBU
- Pentium D 940
- 2G DDR2 667
- (4) 750G Seagate SATAII drives (AS series)

RAID levels:
- machine was configured for RAID5 but that was horribly slow, 12 MB/Sec
- created a (2) drive RAID0, then sliced out a 100G partition
- journal was on a separate JBOD disk
- write caching was enabled for the RAID0 and journal disk
- 64K stripes was used on RAID0 and JBOD journal

File system configuration:
- 100G ext3 file system
- Used a 32M journal on a physically separate device
- used "ordered" mode for the journal
- mounted with "noatime,nodiratime,noauto,noacl,nouser_xattr,dirsync"
- used the mkfs.ext3 -E option to set stripes to 16
   - RAID0 was using 64K stripes.
   - fs was using 4K blocks
- each file transaction did: open(),write(),fsync(),close()
- slammed 1024 1MB chucks at it

I got 36 MB/Sec consistently.  A good sign because with the proper hardware, this would perform really well.

In production, I would probably use a RAID10 with at least 12 15K SAS/FC drives with dual controllers in Active-Active mode: failover+load balancing.  Either fiber or SAS connected.  That should scream!

Fortunately, this config needs very little space ... maybe 500G in total.  So the hardware cost is not terrible.  This config is for a queue directory that is crawled by a background process.  That process moves the data from this queue to mass "slow" storage, fiber attached SATAII 7200RPM RAID5.  The queue needs to be as fast as possible and must sync the data.  Tricky problem :)

thanks.

Andreas Dilger <adilger clusterfs com> wrote:
On Mar 22, 2007 20:44 -0700, brian stone wrote:
> Machine A connects to machine B on a gigabit lan. Machine A sends
> 1024 1MB chucks of data; 1 GB in total. Machine B, the server, reads
> in the MB and writes it to a file.
>
> NOTE: server and client are little test programs written in C.
>
> Machine B (Server) hardware:
> - Single (no raid) Seagate Cheetah 70G Ultra320 15K
> - Quad Opteron 870
> - 16G DDR400
> - Backplane: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 8)
>
> Sync methods include:
> 1. mount with sync option
> - tried sync,dirsync which added no additional overhead
> 2. use O_SYNC open() flag
> 3. use fdatasync() just before closing the file
> - fsync() and fdatasync() produced the same results
>
>
> EXT2 tests
> ==========================================
> No sync 12.3 seconds (83 MB/Sec)
> mount=sync 44.3 seconds (23 MB/Sec)
> O_SYNC 31.7 seconds (32 MB/Sec)
> fdatasync() 31.3 seconds (32 MB/Sec)
>
>
> EXT3 tests
> ===========================================
> No sync data="" 14.5 seconds (70 MB/Sec)
> No sync data="" 17 seconds (60 MB/Sec)
> No sync data="" 65 seconds (15 MB/Sec)
> data="" O_SYNC 49 seconds (20 MB/Sec)
> data="" 52 seconds (19 MB/Sec)
> data="" fdatasync() 45.5 seconds (22 MB/Sec)
> data="" O_SYNC 72.5 seconds (14 MB/Sec)
> data="" 81 seconds (12 MB/Sec)
> data="" fdatasync() 60.5 seconds (17 MB/Sec)

If you are doing a large number of 1MB writes then I agree that
data="" is probably not the way to go because it means you
can get at most 1/2 of the bandwidth of the disk (unless you
create the journal on a separate disk). data="" is good
for small writes and lots of transactions, like mail servers
that need lots of sync operations.

For large writes, I'd suggest you put the journal on a separate
device, and make it 1 or 2 GB (your server has plenty of RAM,
so that isn't a problem). Are you using EAs, like selinux or
similar? If yes, then you should also format your filesystem
with large inodes (-I 256).

You may also want to try out ext4dev with the mballoc and delalloc
patches from Alex Tomas, as this code has been optimized for
doing large power-of-two allocations in the filesystem. They've
been posted to the ext4-devel lists a couple of times.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.
[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]