[Linux-cluster] GFS2: Huge number of glocks

Steven Whitehouse swhiteho at redhat.com
Mon Sep 28 09:02:05 UTC 2009


Hi,

On Fri, 2009-09-25 at 15:18 -0400, Allen Belletti wrote:
> Hi All,
> 
> I had posted about this once before and didn't get a response.  Was
> really hoping that Steven or another person who's involved might be able
> to comment at least briefly.  It would really help to know if this is
> "normal" or not.
> 
> Thanks,
> Allen
> 
> -------------
> 
> I've been running GFS and now GFS2 for several years on a two-node mail
> cluster, generally with good results, especially once GFS2 became
> production ready and we upgraded.  However from time to time (ranging
> from a few days to a month), we'll get a "stuck" lock on one particular
> file or another which them blocks a user from their mail.  While looking
> into this, I've recently become aware of a VERY large number of glocks
> being left behind after our nightly rsync backups.  I'm checking on the
> lock situation with "gfs2_tool lockdump /home" and counting locks by
> piping through "grep ^G | wc -l".  We have two GFS2 filesystems
> mounted.  On one of them, the number of glocks returns to "normal" after
> the backup (currently showing about 5400.)  On the other, it stays very
> high although it will drop somewhat throughout the day.  Currently I am
> seeing over 500,000.  Given the ten minutes or so that it takes to list
> them, this seems like it can't be great for performance.
> 
Listing glocks via the sysfs interface is not very efficient. It is
however rather better than it used to be, due to the reduced amount of
text per glock which is generated (compared with gfs1 for example).

> Most of the locks look like this:
> 
> G:  s:SH n:5/b25806 f: t:SH d:EX/0 l:0 a:0 r:3
> H: s:SH f:EH e:0 p:31042 [(ended)] gfs2_inode_lookup+0x114/0x1f0 [gfs2]
> 
Its a shared lock that is being cached in case of future use. It is
harmless, tbh and looks normal to me.

> Note that the pid (31042 in this case) corresponds to one of the
> completed rsync processes which generated the locks in the first place.
> 
The holder relates to an inode which was looked up by the pid in
question. It will continue to exist until the inode is pushed out of
cache.

> My questions are 1) Is this a bad thing?  My gut feeling is "yes" but
> perhaps the system is highly efficient in dealing with these locks, and
Generally it's a good thing. Each of those cached locks relates to disk
I/O which does not need to be done if the same inode is accessed in
future.

> 2) Can anything be done about it?  The tuning opportunities in GFS2 are
> very limited compared to GFS, and the few things I've tried seem to have
> no effect.
> 
That is deliberate policy - the idea is to be self tuning. If you read
in every inode in the filesystem (which rsync tends to do) then you are
going to fill the cache on the node, just the same as if you did the
same thing to a single node fs. The difference is that in the cluster
case that makes subsequent accesses to the same inode on the node that
did the rsync much faster, and subsequent accesses from a different node
much slower (if they are write accesses and thus require exclusive
locks).

If that is a problem then making a VFS drop caches request after the
rsync might well help prevent/reduce some of the symptoms.

There are further improvements that we can make. One of the big issues
is the amount of writeback that requests to drop locks can cause
(assuming cached dirty data). Also multiple requests to drop locks from
many nodes at once tend not to produce an efficient pattern of I/O.
Solving that problem though is hard, and something that we would like to
do, but may take some time.

> By the way, I am running with plock_ownership="1" and
> plock_rate_limit="0" in cluster.conf.
> 
> Thanks in advance,
> Allen
> 
I'd suggest turning off plock_ownership unless you are on a very
uptodate kernel as it is broken on some early kernels. Having
plock_rate_limit=0 is a good plan though. Unless you are using an
application which uses plocks (I don't know if rsync does or not) then
these will not make any difference, anyway,

Steve.





More information about the Linux-cluster mailing list