[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS2 directory hangs on one node CentOS 5.3



Hi,
>Hi,
>
>On Sat, 2009-09-26 at 18:29 +0200, Libor Tomsik wrote:
>> Hi all,
>>
>> I'm having a strange issue with a two nodes cluster based on xen
>> virtual hosts with shared disk on clvm. The servers are running apache
>> and one is considered as hot backup. On that node awstats are counted
>> from the apache custom logs stored on the shared device. Web data,
>> logs, configs and awstats results are in different directories withing
>> the same GFS2 volume.
>>
>> Everything works fine, but sometimes (at production environment, damn)
>> the directory with logs get frozen for the spare node with awstats.
>> All commands like ls, cd, mc on that directory get status D. On the
>> second node all works fine. Other directories seems unaffected too.
>>
>> I can not umount fs neither remout it ro and back rw since there are
>> "running" processes at D state.
>>
>> Can someone give me some advice, how-to prevent this problem? And
>> how-to recovery from it? It is a production with SLA on :(  In next
>> time, I'll try to make lockdump on both nodes.
>>
>> Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2,
>> kmod-gfs2-xen-1.92-1.1.el5_2.2
>>
>> Regards
>>
>> Libor
>>
>That sounds to me like there is a lot of activity from both nodes
>relating to the same directory. Can you split the logs of the two nodes
>into two different directories? That will probably solve the problem.
>
Actually there is just one apache writing on one server. Well in many
threads. Maybe this is the problem? I have about 40 sites hosted
there. So 2x40 separate log files.
The second node is just periodically reading this directory.

>This kind of problem is tricky to debug since the glock dumps will tell
>you what state the glocks are currently in, and not what has been
>happening the in past.
>
>In the upstream code we've now got GFS2 tracepoints which will help in
>tracking down issues like this, but those are not in RHEL yet,
>
>Steve.
>
>> --
>> Linux-cluster mailing list
>> Linux-cluster redhat com
>> https://www.redhat.com/mailman/listinfo/linux-cluster

Regards

Libor.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]