[Linux-cluster] GFS performance

Fri Jan 4 17:59:10 UTC 2008

Hi all..

I feel compelled to chime in on this GFS performance thread as we have a 
three node GFS environment running RHEL4.6 that was suffering from severe 
memory utilization (100% on a 32GB system) on all nodes and unacceptably 
poor performance.  The three nodes serve five GFS file systems which range 
from 100GB to 1.2TB in size and are home to a diverse combination of very 
large and very small files.

The degradation in performance always coincided with backup process 
starting, i.e. large numbers of inodes being read and cached, and was so 
bad that I was considering abandoning our GFS implementation altogether. 
Basic Unix commands such as df, ls and mkdir either took several minutes 
to complete or never finished at all.  The only way to resolve the problem 
was to reboot all three production nodes which alleviated the problem 
until the next backup started.

With a recommendation from RedHat support I implemented the tunable GFS 
parameter that Wendy describes in 
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 
by setting glock_purge to 50 for all file systems and it has made a 
dramatic difference.  The memory utilization is no longer apparent and 
overall performance is very acceptable even when backups are running.

If you're are not at update 6 yet then I would urge you to upgrade as soon 
as possible to take advantage of this new feature.

Regards,

Paul McDowell
Celera

Wendy Cheng <wcheng at redhat.com> 
Sent by: linux-cluster-bounces at redhat.com
01/04/2008 11:04 AM
Please respond to
linux clustering <linux-cluster at redhat.com>

To
linux clustering <linux-cluster at redhat.com>
cc

Subject
Re: [Linux-cluster] GFS performance

Kamal Jain wrote:
> Feri,
>
> Thanks for the information.  A number of people have emailed me 
expressing some level of interest in the outcome of this, so hopefully I 
will soon be able to do some tuning and performance experiments and report 
back our results.
>
> On the demote_secs tuning parameter, I see you're suggesting 600 
seconds, which appears to be longer than the default 300 seconds as stated 
by Wendy Cheng at 
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 
-- we're running RHEL4.5.  Wouldn't a SHORTER demote period be better for 
lots of files, whereas perhaps a longer demote period might be more 
efficient for a smaller number of files being locked for long periods of 
time?
> 

This demote_secs tunable is a little bit tricky :) ... What happens here 
is that, GFS caches glocks that could get accumulated to a huge amount 
of count. Unless vm releases these inodes (files) associated with these 
glocks, current GFS internal daemons will do *fruitless* scan trying to 
remove these glock (but never succeed). If you set the demote_secs to a 
large number, it will *reduce* the wake-up frequencies of these daemons 
doing these fruitless works, that, in turns, leaving more CPU cycles for 
real works. Without glock trimming patch in place, that is a way to tune 
a system that is constantly touching large amount of files (such as 
rsync). Ditto for "scand" wake-up internal, making it larger will help 
the performance in this situation.

With the *new* glock trimming patch, we actually remove the memory 
reference count so glock can be "demoted" and subsequently removed from 
the system if in idle states. To demote the glock, we need gfs_scand 
daemon to wake up often - this implies we need smaller demote_secs for 
it to be effective.
> On a related note, I converted a couple of the clusters in our lab from 
GULM to DLM and while performance is not necessarily noticeably improved 
(though more detailed testing was done after the conversion), we did 
notice that both clusters became more stable in the DLM configuration.
> 
This is mostly because DLM is the current default lock manager (with 
on-going development efforts) while GULM is not actively maintained.

-- Wendy

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080104/4f5eb69c/attachment.htm>