[Linux-cluster] performance bottleneck on 36-disk GFS/NFS cluster

Riaan van Niekerk riaan at obsidian.co.za
Wed Jul 19 15:33:07 UTC 2006


We have a 2.5 TB GFS (6.1) with 2 TB of data spread via RAID10 metaLUNs 
on an EMC CX500. The GFS is running on 4 nodes of varying size (some 
have 24CPU 2GB RAM. others 2CPU 1GB RAM), exported via NFS to 10 NFS 
clients which are POP/IMAP/SMTP servers in an ISP environment. The IPs 
are managed by rgmanager.

The data is a couple of hundred thousand mailboxes (in MailDir) format.

Some performance metrics:
- Load average for NFS servers is about 8 - 16 per NFS client mounted. 
The big servers have 4 clients each (32 - 64 load average). The smaller 
servers have 1 client each (8 - 16 load average).
- dlm_recvd is by far the busiest process in top (10 - 20%), followed by 
nfsd processes and gfs_inoded, lock_dlm, gfs_scand.

Here is the output of one of the outputs of "iostat -x dm-0 5":

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
dm-1         0.00   0.00 2043.23 970.91 16345.86 7767.27  8172.93 
3883.64     8.00    35.62   11.86   0.34 101.01
Some notable numbers are 101.01 % utilization, reads per sec and writes 
per sec in the thousands.

My questions are:
1 (Taking a guestimate) Is my problem lack of spindles, or the 
inefficiencies of NFS via GFS (I have heard a number of others on the 
list complain often about NFS on GFS performance).

2 What would be the best way to improve performance? We have a couple of 
options:
a) Collapse/remove the NFS layer. Make a large number of mail servers 
SAN-attached GFS nodes. (perhaps not all, but 4 to 6. by converting the 
GFS nodes into mail server)
b) add more spindles. We are in the process of adding 24 more spindles 
(the CX is already taking almost a day to restripe the metaLUN. We might 
be able to add 24 more spindles and restripe again.

3 Is having dlm_recvd as the top process normal/typical for an I/O bound 
GFS cluster? Even though the MailDir mail store consists of million of 
files, nodes should very rarely write into the same directory at the 
same time (meaning that directory lock contention should be avoided)

thank you in advance
Riaan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: riaan.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060719/cbd6219a/attachment.vcf>


More information about the Linux-cluster mailing list