[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS 6.0 lt_high_locks value in cluster.ccs



Chris Feist wrote:

Yes, issue #2 could definitely be the cause of your first issue. Unfortunately you'll need to bring down your cluster to change the value of lt_high_locks. What is its value currently? And how much memory do you have on your gulm lock servers? You'll need about 256M of RAM for gulm for every 1 Million locks (plus enough for any other process and kernel).

On each of the gulm clients you can also cat /proc/gulm/lockspace to see which client is using most of the locks.

Thanks for the response! I figured I would probably have to bring down the cluster to change the highwater setting, but I was hoping a bit that it could be changed dynamically. Oh well.

The value is currently at the default, which I want to say is something like 1.04M. These machines are both lock servers and samba/NFS servers, and have 4GB of RAM available (I have three lock servers in the cluster, and all three have 4GB of RAM). A previous RedHat service call has me running the hugemem kernel on all three (the issue there was that, under just light activity loading, lowmem would be exhausted and the machines would enter an OOM spiral of death). Now that I have turned off hyperthreading, though, memory usage seems to be dramatically lower than it was prior to that change. For instance, the machine running samba services has been running since I turned off hyperthreading on Friday night. Today, the machine was under some pretty heavy load. On a normal day, prior to the hyperthreading change, I'd be down to maybe 500MB of lowmem free right now (out of 3GB). The only way to completely reclaim that memory would be to reboot. So, now I'm sitting here looking at this machine, and it has 3.02GB of 3.31GB free. I'm going to have to let this run for a while to determine if this is a red herring, but it looks much better than it ever has in the past.

Here's the interesting output from the /proc/gulm gadgets (note that, at the time I grabbed these, I was seeing the "more than the max" message logged to syslog between once and twice per minute, but not at the 10-second rate that I read about previously):

[root xxxxx root]# cat /proc/gulm/filesystems/data0
Filesystem: data0
JID: 0
handler_queue_cur: 0
handler_queue_max: 26584
[root xxxxx root]# cat /proc/gulm/filesystems/data1
Filesystem: data1
JID: 0
handler_queue_cur: 0
handler_queue_max: 4583
[root xxxxx root]# cat /proc/gulm/filesystems/data2
Filesystem: data2
JID: 0
handler_queue_cur: 0
handler_queue_max: 11738
[root xxxxx root]# cat /proc/gulm/lockspace

lock counts:
  total: 41351
    unl: 29215
    exl: 3
    shd: 12055
    dfr: 0
pending: 0
   lvbs: 16758
   lops: 12597867

[root xxxxx root]#


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]