[Linux-cluster] performace tuning

Mon Feb 4 16:54:16 UTC 2008

Hey all,

My company has gone live with a GFS cluster this morning.  It is a 4
node RHEL4U6 cluster, running RHCS and GFS.  It mounts an Apple 4.5TB
XRAID configured as RAID5, whose physical volumes are combined into
one large volume group.  From this volume group, five striped LVMs
(striped across the two physical volumes of the XRAID) were created.
Five GFS filesystems were created, one on each logical volume. Even
though there are currently four nodes, there are 12 journals for each
filesystem to allow for planned cluster growth.

Currently, each filesystem is mounted noatime, and tunebles
quota_enforce and quota_account are set to 0.  I have posted the
results of gfs_tool gettune /hq-san/nlp/nlp_qa below.  We have an
application which depends heavily upon a find command that lists a
number of files.  It looks something like this:
find $delta_home -name summary -maxdepth 2 -type f

Its output consists of thousands of files that exist on
/hq-san/nlp/nlp_qa.  This command is CRAWLING at the moment.  An ext3
filesystem would output hundreds of matches a second.  This GFS
filesystem is currently outputting 100-200/minutes.  This is crippling
one of our applications.  Any advice on tuning this filesystem for
this kind of access would be greatly appreciated.

Output from a gfs_tool df /hq-san/nlp_qa:

odin / # gfs_tool df /hq-san/nlp/nlp_qa
/hq-san/nlp/nlp_qa:
  SB lock proto = "lock_dlm"
  SB lock table = "boson:nlp_qa"
  SB ondisk format = 1309
  SB multihost format = 1401
  Block size = 4096
  Journals = 12
  Resource Groups = 3009
  Mounted lock proto = "lock_dlm"
  Mounted lock table = "boson:nlp_qa"
  Mounted host data = ""
  Journal number = 1
  Lock module flags =
  Local flocks = FALSE
  Local caching = FALSE
  Oopses OK = FALSE

  Type           Total          Used           Free           use%
  ------------------------------------------------------------------------
  inodes         15167101       15167101       0              100%
  metadata       868298         750012         118286         86%
  data           219476789      192088469      27388320       88%

Output from a df -h:

/dev/mapper/hq--san-cam_development 499G  201G  298G  41%
/hq-san/nlp/cam_development
/dev/mapper/hq--san-nlp_qa 899G  794G  105G  89% /hq-san/nlp/nlp_qa
/dev/mapper/hq--san-svn_users 1.5T  1.3T  282G  82% /hq-san/nlp/svn_users
/dev/mapper/hq--san-development 499G  373G  126G  75% /hq-san/nlp/development
/dev/mapper/hq--san-prod_reports 1023G  680G  343G  67% /hq-san/nlp/prod_reports

odin / # gfs_tool gettune /hq-san/nlp/nlp_qa
ilimit1 = 100
ilimit1_tries = 3
ilimit1_min = 1
ilimit2 = 500
ilimit2_tries = 10
ilimit2_min = 3
demote_secs = 300
incore_log_blocks = 1024
jindex_refresh_secs = 60
depend_secs = 60
scand_secs = 5
recoverd_secs = 60
logd_secs = 1
quotad_secs = 5
inoded_secs = 15
glock_purge = 0
quota_simul_sync = 64
quota_warn_period = 10
atime_quantum = 3600
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_enforce = 0
quota_account = 0
new_files_jdata = 0
new_files_directio = 0
max_atomic_write = 4194304
max_readahead = 262144
lockdump_size = 131072
stall_secs = 600
complain_secs = 10
reclaim_limit = 5000
entries_per_readdir = 32
prefetch_secs = 10
statfs_slots = 64
max_mhc = 10000
greedy_default = 100
greedy_quantum = 25
greedy_max = 250
rgrp_try_threshold = 100
statfs_fast = 0
seq_readahead = 0

Shawn