[Linux-cluster] GFS2 and backups (performance tuning)
Steven Whitehouse
swhiteho at redhat.com
Fri Dec 4 09:44:30 UTC 2009
Hi,
I'd suggest filing a bug in the first instance. I can't see anything
obviously wrong with what you are doing. The fcntl() locks go via the
dlm and dlm_controld not via the glock_workqueues, so I don't think that
is likely to be the issue,
Steve.
On Thu, 2009-12-03 at 12:42 -0800, Ray Van Dolson wrote:
> We have a two node cluster primarily acting as an NFS serving
> environment. Our backup infrastructure here uses NetBackup and,
> unfortunately, NetBackup has no PPC client (we're running on IBM JS20
> blades) so we're approaching the backup strategy in two different ways:
>
> - Run netbackup client from another machine and point it to NFS share
> on one of our two cluster nodes
> - Run rsyncd on our cluster nodes and rsync from a remote machine.
> NetBackup then backs up that machine.
>
> The GFS2 filesystem in our cluster only is storing about 90GB of data,
> but has about one million files (inodes used reported via df -i) on it.
>
> (For the curious, this is a home directory server and we do break
> thinsg up under a top level hierarchy of a folder for each first letter
> of a username).
>
> The NetBackup over NFS route is extremely slow and spikes the load up
> on whichever server is being backed up from. We made the following
> adjustments to try and improve performance:
>
> - Set the following in our cluster.conf file:
>
> <dlm plock_ownership="1" plock_rate_limit="0"/>
> <gfs_controld plock_rate_limit="0"/>
>
> ping_pong will give me about 3-5k locks/sec now.
>
> - Mounted filesystem with noatime,nodiratime,quota=off
>
> This seems to have helped a bit, but things are still taking a long
> time. I should note here that I tried running ping_pong to one of our
> cluster nodes via one of its NFS exports of the GFS2 filesystem. While
> I can get 3000-5000 locks/sec locally, over NFS it was about... 2 or 3
> (not thousand, literally 2 or 3). tcpdump of the NLM port shows the
> NFS lock manager on the node responding NLM_BLOCK most of the time.
> I'm not sure if GFS2 or our NFS daemon is to blame... in any case...
>
> .. I've set up rsyncd on the cluster nodes and am sync'ing from a
> remote server now (all of this via Gigabit ethernet). I'm over an hour
> in and the client is still generatin the file list. strace confirms
> that rsync --daemon is still trolling through, generating a list of
> files on the filesystem...
>
> I've done a blktrace dump on my GFS2 filesystem's block device and can
> clearly see glock_workqueue showing up the most by far. However, I
> don't know what else I can glean from these results.
>
> Anyone have any tips or suggestions on improving either our NFS locking
> or rsync --daemon performance beyond what I've already tried? It might
> almost be quicker for us to do a full backup each time than to spend
> hours building file lists for differential backups :)
>
> Details of our setup:
>
> - IBM DS4300 Storage (12 drive RAID5 + 2 spares)
> - Exposed as two LUNs (one per controller)
> - Don't believe this array does hardware snapshots :(
> - Two (2) IBM JS20 Blades (PPC)
> - QLogic ISP2312 2Gb HBA's
> - RHEL 5.4 Advanced Platform PPC
> - multipathd
> - clvm aggregates two LUNs
> - GFS2 on top of clvm
> - Configured with quotas originally, but disabled later by
> mounting quota=off
> - Mounted with noatime,nodiratime,quota=off
>
> # gfs2_tool gettune /domus1
> new_files_directio = 0
> new_files_jdata = 0
> quota_scale = 1.0000 (1, 1)
> logd_secs = 1
> recoverd_secs = 60
> statfs_quantum = 30
> stall_secs = 600
> quota_cache_secs = 300
> quota_simul_sync = 64
> statfs_slow = 0
> complain_secs = 10
> max_readahead = 262144
> quota_quantum = 60
> quota_warn_period = 10
> jindex_refresh_secs = 60
> log_flush_secs = 60
> incore_log_blocks = 1024
>
> # gfs2_tool getargs /domus1
> data 2
> suiddir 0
> quota 0
> posix_acl 1
> upgrade 0
> debug 0
> localflocks 0
> localcaching 0
> ignore_local_fs 0
> spectator 0
> hostdata jid=1:id=196610:first=0
> locktable
> lockproto
>
> Thanks in advance for any advice.
>
> Ray
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list