[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] dlm high cpu on latest stock centos 5.1 kernel



Some progress...

We had another dlm_sendd lockup yesterday which prompted us to do some reworking of our file sharing. Previously we had both SMB and NFS services competing for GFS resources on this particular node. We thought perhaps it was this combination which may have provoked the lockups... so, we moved things around with the help of another server in our GFS cluster.

Previously we had:

Machine A (nfs and smb services sitting on top of gfs)
NFS  SMB
GFS

And switched things around to this:

Machine A
SMB
NFS -> Machine B

Machine B
NFS
GFS

Basically we moved all NFS mounts to machine B.... NFS is the only file sharing service using GFS on this machine, and changed Machine A to use an NFS mount to machine B. This way we don't have any nodes with both SMB and NFS services running on top of GFS.

Previously we had 1-2 lockups a day, but today nothing... so far so good. Not sure if this configuration will work for you... let me know if you need any further clarification.

d


On 1-Apr-08, at 5:51 PM, Andrew A. Neuschwander wrote:

My symptoms are similar. dlm_send sits on all of the cpu. Top shows the cpu spending nearly all of it's time in sys or interrupt handling. Disk
and network I/O isn't very high (as seen via iostat and iptraf). But
SMB/NFS throughput and latency are horrible. Context switches per second as seen by vmstat are in the 20,000+ range (I don't now if this is high
though, I haven't really paid attention to this in the past). Nothing
crashes, and it is still able to serve data (very slowly), and eventually
the load and latency recovers.

As an aside, does anyone know how to _view_ the resource group size after
file system creation on GFS?

Thanks,
-Andrew


On Tue, April 1, 2008 6:30 pm, David Ayre wrote:
What do you mean by pounded exactly ?

We have an ongoing issue, similar... when we have about a dozen users
using both smb/nfs, and at some seemingly random point in time our
dlm_senddd chews up 100% of the CPU... then dies down at on its own
after quite a while.  Killing SMB processes, shutting down SMB didn't
seem to have any affect... only a reboot cures it.  I've seen this
described (if this is the same issue) as a "soft lockup" as it does
seem to come back to life:

http://lkml.org/lkml/2007/10/4/137

We've been assuming its a kernel/dlm version as we are running
2.6.9-55.0.6.ELsmp with dlm-kernel 2.6.9-46.16.0.8

we were going to try a kernel update this week... but you seem to be
using a later version and still have this problem ?

Could you elaborate on "getting pounded by dlm" ?  I've posted about
this on this list in the past but received no assistance.




On 1-Apr-08, at 5:19 PM, Andrew A. Neuschwander wrote:

I have a GFS cluster with one node serving files via smb and nfs.
Under
fairly light usage (5-10 users) the cpu is getting pounded by dlm. I
am
using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This
sounds
like the dlm issue mentioned back in March of last year
(https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html
)
that was resolved in 2.6.21.

Has (or will) this fix be back ported to the current el5 kernel?
Will it
be in RHEL5.2? What is the easiest way for me to get this fix?

Also, if I try a newer kernel on this node, will there be any harm
in the
other nodes using their current kernel?

Thanks,
-Andrew
--
Andrew A. Neuschwander, RHCE
Linux Systems Administrator
Numerical Terradynamic Simulation Group
College of Forestry and Conservation
The University of Montana
http://www.ntsg.umt.edu
andrew ntsg umt edu - 406.243.6310

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~
David Ayre
Programmer/Analyst - Information Technlogy Services
Emily Carr Institute of Art and Design
Vancouver, B.C.   Canada
604-844-3875 /  david eciad ca

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~
David Ayre
Programmer/Analyst - Information Technlogy Services
Emily Carr Institute of Art and Design
Vancouver, B.C.   Canada
604-844-3875 /  david eciad ca


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]