[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] GFS, Locking, Read-Only, and high processor loads


I'm setting up a GFS implementation and was wondering what kind of tuning parameters I can set for both read-only and read-write.

I work for a company that is migrating to a SAN, implementing GFS as the filesystem. We currently rsync our data from a master server to 5 front-end webservers running Apache and PHP. The rsyncs take an extraordinarily long time as our content (currently >2.5 million small files) grows, and does not scale very well as we add more front-end machines. Our thinking was to put content generated on two inward facing editorial machines on the SAN as read/write, and our web front- ends as read-only. All temporary files and logging would write to local disk. The goal of our initial work was to create this content filesystem, mount the disks, eliminate the rsyncs, and free up our rsync server for use as a slave database server.

We used the Luci to configure a node and fencing on a new front-end, and formatted and configured our disk with it. Our deploy plan was to set this machine up, put it behind the load-balancer, and have it operate under normal load for a few days to "burn it in." Once complete, we would begin to migrate the other four front-ends over to the SAN, mounted RO after a reinstall of the OS.

This procedure worked without too much issue until we hit the fourth machine in the cluster, where the cpu load went terrifyingly high and we got many "D" state httpd processes. Googling "uninterruptible sleep GFS php" I found references from 2006 about file locking with php and its use of flock() at the start of a session. The disks were remounted as "spectator" in an attempt to limit disk I/O on journals. This seemed to help, but as it was the end of the day seems a false positive. The next day, CPU load was again incredibly high, and after much flailing about we went back to local ext3 disks to buy us some time.

I'm reading through this list, which is very informative. I'm attempting to tune our GFS mounts a bit, watching the output of gfs_tool counters on the filesystems, and looking for any anomalies. Here's a more detailed description of our setup:

Our hardware configuration consists of a NexSAN SATABoy populated with 8 750GB disks (RAID 5/4.7Tb), and a Brocade Silkworm 3800 for data and fencing. We purchased QLogic single-port, 4Gb HBAs for our servers. (more info available on request)

The RAID has 4 partitions, 2 are not mounted:

local - (not mounted) 500GB, extents 4.0MB, block size 4KB, attributes -wi-ao,
		dlm lock protocol - mount /usr/local_san (rw)
		this is a copy of /usr/local, which can be synced to all hosts
code - (not mounted) 500GB, extents 4.0MB, block size 4KB, attributes -wi-ao,
		dlm lock protocol - mount /web/code (rw)
this is a copy of /huffpo/web/prod, without the www content and tmp trees
	tmp -   500GB, extents 4.0MB, block size 4KB, attributes -wi-a-,
		dlm lock protocol - mount /web/prod/tmp (rw)
		this is the temporary directory for front-end web code
	www -   2TB, extents 4MB, block size 4KB, attributes -wi-ao,
		dlm local protocol - mount /web/prod/www (ro)
read-only content directory, 4 hosts, /etc/fstab options at the time were ro
                read/write on 1 host

	we have ~2 more TB available, currently not in use

After reading the list a bit, I've come up with the following tunings for read-only:

     gfs_tool settune /web/prod/www/content glock_purge 80
     gfs_tool settune /web/prod/www/content quota_account 0
     gfs_tool settune /web/prod/www/content demote_secs 60
     gfs_tool settune /web/prod/www/content scand_secs 30

     /etc/fstab has spectator,noatime,num_glockd=32 as mount options

And the read/write host has:

     gfs_tool settune /web/prod/www/content statfs_fast 1

     /etc/fstab has num_glockd=32,noatime as mount options

I've noticed using gfs_tool counters /web/prod/www/content usually has sub 80k locks for the read/write host running rsync, and sub 10k locks for the one (and only) read-only host, where previously the number of locks on all hosts numbered ~80k.

Can I be a bit more aggressive with locks on read-only filesystems with the current tunings enabled? I'm not sure what the purpose of the locks on read-only filesystems serve in this instance.

Is there a better configuration for heavy reads on a GFS filesystem that is read only? vmstat -d gives me for this filesystem: disk- ------------reads------------ ------------writes----------- ----- IO------
sdc 411192 82490 3998862 7402555 607 645 10016 3837 0 695

My big fear is although the systems currently seem to be running without too much incident, as I add nodes back into the cluster the number of locks and system load will again run high. As we transition from using rsync to writing directly onto the SAN, the number of locks on rw hosts should go down because the spendy directory scans should be removed.

Are there certain other optimizations I could use to lower the lock counts?

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]