[Linux-cluster] GFS 6.0 Questions

Gerald G. Gilyeat ggilyeat at jhsph.edu
Tue Feb 15 17:59:46 UTC 2005


Thanks a bunch. 
The direction I was leaning on going, then, seems appropriate. I love it when things start coming together.

Is there anyway to get some of these undocumented tunable features, well, documented? I couldn't for the life of me find anything indicating if the lock highwater mark was runtime tunable, for example. 
There is -some- concern about memory usage tanking things, but that will probably end up leading us to simply moving to dedicated locking servers instead of having them on the actual shared production machines (and really, we'd only need two...'f1' is strictly for management type work and backups...)

Finally - so while it's -possible- to have the GFS "stuff" on a separate interface (and yes, it was a royal PITA getting it to work in the first place what with multiple NICs already...), it's not somthing that's at all easy to do, at least until the mentioned fix drops?  bleh.

Thanks!

--
Jerry Gilyeat, RHCE
Systems Administrator
Molecular Microbiology and Immunology
Johns Hopkins Bloomberg School of Public Health



-----Original Message-----
From: linux-cluster-bounces at redhat.com on behalf of Michael Conrad Tadpol Tilstra
Sent: Tue 2/15/2005 12:43 PM
To: linux clistering
Subject: Re: [Linux-cluster] GFS 6.0 Questions
 
Gerald G. Gilyeat wrote:

[snip]
> First, the GFS side of things is currently sharing the cluster's 
> internal network for it's communications, mostly because we didn't have 
> a second switch to dedicate to the task. While the cluster is currently 
> lightly used, how sub-optimal is this? I'm currently searching for 
> another switch that a partnering department has/had, but I don't know if 
> they even know where it is at this point.

It really depends on how much the actual link is used.  The more data 
that the other apps are pushing over the ethernet, the less of it gulm 
can use.  It is also rather (unfortunately) difficult to tell gulm to 
use a different network device in the current releases.  There is a fix 
pending for this, but its not out yet.

> Second: GFS likes to fence "e0" off on a fairly regular/common basis 
> (once every other week or so, if not more often). This is really rather 
> bad for us, from an operational standpoint - e0  is vital to the 
> operation of our Biostatistics Department (Samba/NFS, user 
> authentication, etc...). There is also some pretty nasty latency on 
> occasion, with logins taking upwards of 30seconds to return to a prompt, 
> providing it doesn't time out to begin with.

If the machine is getting this kind of delay, it is completely possible 
that the delay is also causing heartbeats to be missed.

> In trying to figure out -why- it's constantly being fenced off, and in 
> trying to solve the latency/performance issues, I've noticed a -very- 
> large number of "notices" from GFS like the following:
> Feb 15 10:56:10 front-1 lock_gulmd_LT000[4073]: Lock count is at 1124832 
> which is more than the max 1048576. Sending Drop all req to clients
> 
> Easy enough to gather that we're blowing away the current lock highwater 
> mark.
> Is upping the highwater point a feasable thing to do -and- would it have 
> an affect on performance, and what would that affect be?

cluster.ccs:
cluster {
  lock_gulm {
    ....
    lt_high_locks = <int>
  }
}

The highwater mark is an attempt to keep the amount of memory lock_gulmd 
uses down.  When the highwater is hit, the lock server tells all gfs 
mounts to try and release locks.  It does this every 10 seconds until 
the lock count falls below the highwater mark.  This requires cycles, 
and so not doing it means less cycles used.  The higher the highwater 
mark is, the more memory the gulm lock servers and gfs will use to store 
locks.  The number is just the count of locks (in <=6.0) and not an 
actual representation of ram used.

In short summery, in your case, a higher highwater mark may give some 
performance gained, at the loss of some memory available to other programs.


> This weekend, we also noticed another weirdness (for us, anyways...) - 
> e0 was fenced off on Saturday morning at 0504.09am, almost precisely 24 
> hours later e0 decided that the problem was the previous GFS master 
> (f0), arbitrated itself to be Master, took over, fenced off F0 and then 
> proceeded to hose the entire thing by the time I heard about things and 
> was able to get on-site to bring it all back up (at 1am Monday morning). 
> What is this apparent 24-hour timer, and is this expected behaviour?

No, it sounds like some kind of freak chance.  A very icky thing indeed. 
  Very much sounds like a higher heartbeat_rate is needed.

> Finally - would increasing the heartbeat timer and the number of 
> acceptable misses an appropriate and acceptable way to help decreases 
> the frequency of e0 being fenced off?

Certainly.  The default values for the heartbeat_rate and allowed_misses 
are just suggestions.  Certain setups may require different values, and 
as far as I know the only way to figure this out is to try it.  Sounds 
very much like you could use larger values.

-- 
michael conrad tadpol tilstra
<my wit is my doom>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5574 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050215/044d1fd5/attachment.bin>


More information about the Linux-cluster mailing list