[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS hangs, nodes die



Hi Marc,

thank you for your answer. Fencing works fine, also everything else
works for long times, except when the I/O raises a certain level....
The kernel instead could indeed be a problem! 2.6.9-55.02 is the
standard Update5 kernel also for CentOS, but I had to downgrade as all
the cs/gfs packages were dependant on 2.6.9-55.0

I totally forgot about it... I will compile everything for the new
kernel now, let's see.

Regards,
Sebastian

PS: Btw, the hanging df processes came from the daily logwatch...


Marc Grimme wrote:
> Hi Sebastian,
> just to double check. Fencing and everything works as expected, right?
>
> 2nd the latest RHEL4 kernel is 2.6.9-55.0.2 (is that also available for 
> centos?). If yes you might think about updating. I'm not sure if something 
> was updated within dlm/gfs but my tests were done with 2.6.9-55.0.2 and I 
> didn't encounter those problems whereas before I had huge amounts of locks 
> (~2times the number of files on the fs).
>
> On Friday 24 August 2007 12:37:15 Sebastian Walter wrote:
>   
>> Hi list,
>>
>> just an update. In my scripts, there is nothing about searching the
>> whole file system, but I see several "df" processes blocking the system
>> with 100 % CPU. I will update firmwares now and check for better QLogic
>> drivers. Thanks!
>>     
> I fear that a firmware update will not change anything but it's always a good 
> option ;-) . I also doubt about the Qlogic drivers cause the ones in 2.6.9-55 
> are quite ok (did you configure multipathing properly?).
> Is that df and everything running concurrently on different nodes?
>
> Last but not least are the "unable to obtain locks" messages the only messages 
> that you see when getting problems?
>
> Regards Marc.
>   
>> Regards,
>> Sebastian
>>
>> Marc Grimme wrote:
>>     
>>> On Tuesday 21 August 2007 09:52:32 Sebastian Walter wrote:
>>>       
>>>> Hi,
>>>>
>>>> Marc Grimme wrote:
>>>>         
>>>>> Do you also see some messages on the console of the nodes. And the
>>>>> gfs_tool
>>>>> counters would help before that problem occures. So let it run
>>>>> sometimes before to see if locks increase.
>>>>> What kind of stress tests are you doing? I bet searching the whole
>>>>> filesystem. What makes me wonder is that the gfs_tool glock_purge does
>>>>> not work whereas it worked for me with exactly the same problems. Did
>>>>> you set it _AFTER_ the fs was mounted?
>>>>>           
>>> Sorry I mean after is right and before not ;-( .
>>> And are you using the latest version of CS/GFS?
>>> Do you have a lot of memory in your machines 16G or more?
>>>
>>>       
>>>> That makes me optimistic. I set it after the volume was mounted, so I
>>>> will give it another try setting it before mounting it. Then I will also
>>>> mail myself the output of the counters every 10 minuts. Let's see...
>>>>         
>>> I would be interested in the counters.
>>> Also add the process list in order to see if how much CPU-Time gfs_scand
>>> consumes.
>>> i.e.
>>> ps axwwww | sort -k4 -n | tail -10
>>>
>>> Have fun Marc.
>>>
>>>       
>>>> ...with best thanks
>>>> Sebastian
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster redhat com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>         
>
>
>
>   


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]