[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Node fenced when mounting gfs



Hi,

I have a two-node cluster running GFS from RHEL4 cvs tag (pulled on June 1st).
I have several GFS LVMs, one of them now uses 16GB storage and 371659 inode (from df -k and df -i).
Other GFS LVMs uses less inodes.


When only one node running, all is OK, I can mount and access all GFS LVMs.
The problem is, when I have both node running and I try to mount that particular LVM, the node that tries to mount it (lincluster2) has these messages on syslog :


Jun 1 02:46:41 lincluster2 GFS: Trying to join cluster "lock_dlm", "lincluster:newapp"
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: Joined cluster. Now mounting FS...
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1: Trying to acquire journal lock...
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1: Looking at journal...
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: jid=1: Done
Jun 1 02:46:43 lincluster2 GFS: fsid=lincluster:newapp.1: Scanning for log elements...


and it seems to hang. I assume it uses all CPU power to scan log elements.
That would've been OK, if only the other node could know that it was still alive. Problem is it doesn't.


Jun 1 02:51:33 lincluster1 kernel: CMAN: removing node lincluster2 from the cluster : Missed too many heartbeats
Jun 1 02:51:33 lincluster1 fenced[4365]: lincluster2 not a cluster member after 0 sec post_fail_delay
Jun 1 02:51:33 lincluster1 fenced[4365]: fencing node "lincluster2"
Jun 1 02:51:38 lincluster1 fenced[4365]: fence "lincluster2" success


It seems that lincluster2 is too busy scanning log elements that it cannot even send CMAN heartbeat. Which makes lincluster1 thinks lincluster2 is dead, thus fencing it, and rebooting it.

Any ideas how to fix this?

Regards,

Fajar


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]