[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] dlm high cpu on latest stock centos 5.1 kernel



My symptoms are similar. dlm_send sits on all of the cpu. Top shows the
cpu spending nearly all of it's time in sys or interrupt handling. Disk
and network I/O isn't very high (as seen via iostat and iptraf). But
SMB/NFS throughput and latency are horrible. Context switches per second
as seen by vmstat are in the 20,000+ range (I don't now if this is high
though, I haven't really paid attention to this in the past). Nothing
crashes, and it is still able to serve data (very slowly), and eventually
the load and latency recovers.

As an aside, does anyone know how to _view_ the resource group size after
file system creation on GFS?

Thanks,
-Andrew


On Tue, April 1, 2008 6:30 pm, David Ayre wrote:
> What do you mean by pounded exactly ?
>
> We have an ongoing issue, similar... when we have about a dozen users
> using both smb/nfs, and at some seemingly random point in time our
> dlm_senddd chews up 100% of the CPU... then dies down at on its own
> after quite a while.  Killing SMB processes, shutting down SMB didn't
> seem to have any affect... only a reboot cures it.  I've seen this
> described (if this is the same issue) as a "soft lockup" as it does
> seem to come back to life:
>
> http://lkml.org/lkml/2007/10/4/137
>
> We've been assuming its a kernel/dlm version as we are running
> 2.6.9-55.0.6.ELsmp with dlm-kernel 2.6.9-46.16.0.8
>
> we were going to try a kernel update this week... but you seem to be
> using a later version and still have this problem ?
>
> Could you elaborate on "getting pounded by dlm" ?  I've posted about
> this on this list in the past but received no assistance.
>
>
>
>
> On 1-Apr-08, at 5:19 PM, Andrew A. Neuschwander wrote:
>
>> I have a GFS cluster with one node serving files via smb and nfs.
>> Under
>> fairly light usage (5-10 users) the cpu is getting pounded by dlm. I
>> am
>> using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This
>> sounds
>> like the dlm issue mentioned back in March of last year
>> (https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html
>> )
>> that was resolved in 2.6.21.
>>
>> Has (or will) this fix be back ported to the current el5 kernel?
>> Will it
>> be in RHEL5.2? What is the easiest way for me to get this fix?
>>
>> Also, if I try a newer kernel on this node, will there be any harm
>> in the
>> other nodes using their current kernel?
>>
>> Thanks,
>> -Andrew
>> --
>> Andrew A. Neuschwander, RHCE
>> Linux Systems Administrator
>> Numerical Terradynamic Simulation Group
>> College of Forestry and Conservation
>> The University of Montana
>> http://www.ntsg.umt.edu
>> andrew ntsg umt edu - 406.243.6310
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster redhat com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~
> David Ayre
> Programmer/Analyst - Information Technlogy Services
> Emily Carr Institute of Art and Design
> Vancouver, B.C.   Canada
> 604-844-3875 /  david eciad ca
>
> --
> Linux-cluster mailing list
> Linux-cluster redhat com
> https://www.redhat.com/mailman/listinfo/linux-cluster


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]