RE: [Linux-cluster] GFS2 and D state HTTPD processes

Hi Steve,

> > I’ve tuned the demote_secs down from 300 to 20 seconds on the
> > assumption that file locking is causing an issue.
> That is unlikely to make any meaningful change and in fact it could well
> hurt performance, depending on the workload.

> >         <gfs_controld plock_ownership="1" plock_rate_limit="0"/>
> >
> Try turning off plock_ownership and see if that fixes the problem

We'll give this a go and see what it does. We did manage to track down the latest issue to a bad script that the customer had written which caused one of the nodes to exhaust all of its available memory. That then caused a knock-on effect to the lock_dlm process which was unable to drop it's file locks, which then rolled the affect on to the rest of the cluster as they started being unable to open files.

> There are two things to look at. One is back traces from processes (echo
> 't' > proc/sysrq-trigger) and the other is the glock dump
> from /sys/kernel/debug/fs/gfs2/glocks. The first tells us what is
> hanging and the second (hopefully) why. Look for glocks with 'W' in the
> flags field (f:) for their holders (H:) and it should be possible to
> correlate them with the processes which are stuck.

Thanks for the above, that's really useful

> Do you get any messages in the syslog?

Sadly not.

I'm just looking at this page;


and for a webserver, or a group of webservers, with a large amount of files comprising the website itself is it worth increasing the drop_resources_time value so that file locks are flushed faster?


Gavin Conway
Senior Engineer, Operations (Systems Group), UKSolutions

Telephone: 0845 004 1333, option 2
Email: gavin conway uksolutions co uk
Web: http://www.uksolutions.co.uk/
UKS Ltd, Birmingham Road, Studley, Warwickshire, B80 7BG Registered in England Number 3036806
This email must be read in conjunction with the legal & service notices on http://www.uksolutions.co.uk/disclaimer.html

