[Linux-cluster] disabling DLM and GFS kernel modules

Tue Sep 18 15:35:02 UTC 2007

The nodes were killed as a result of a condition called "AISONLY"
resulting from the ntp timeofday change which caused a change of the
time of greater then 10 seconds.

Totem uses gettimeofday() to determine the time of day difference
between two time periods.  If the time period is greater then 10
seconds, Totem will indicate a network connectivity lost, reform a
cluster, and the higher level software will detect the aisonly flag on
that node.

Then that node must be rebooted to be operational again.

Ideally gettimeofday() shouldn't be used to detect the difference of two
time periods, but instead itimers or something else should be used.  But
for the moment gettimeofday() is used which doesn't notify the user of
ntp adjustments.

Regards
-steve

On Tue, 2007-09-18 at 10:00 -0500, Chris Harms wrote:
> The only other thing I can think of is that I started NTPd and there was 
> likely a big time adjustment as it had not been running.
> 
> Sep 17 10:27:32 ntpd[1118]: synchronized to 206.222.28.90, stratum 2
> Sep 17 15:53:38 ntpd[1118]: time reset +18217.299628 s
> Sep 17 15:53:38 ntpd[1118]: kernel time sync enabled 0001
> Sep 17 15:53:38 openais[4457]: [TOTEM] The token was lost in the 
> OPERATIONAL state.
> Sep 17 15:53:38 dlm_controld[4480]: cluster is down, exiting
> Sep 17 15:53:38 gfs_controld[4486]: cluster is down, exiting
> Sep 17 15:53:38 fenced[4474]: cluster is down, exiting
> Sep 17 15:53:38 kernel: dlm: closing connection to node 1
> Sep 17 15:53:48 named[8732]: *** POKED TIMER ***
> Sep 17 15:53:48 named[8733]: *** POKED TIMER ***
> Sep 17 15:54:04 ccsd[4437]: Unable to connect to cluster infrastructure 
> after 30 seconds.
> 
> 
> 
> David Teigland wrote:
> > On Tue, Sep 18, 2007 at 09:34:45AM -0500, Chris Harms wrote:
> >   
> >> It said something about an out of memory condition.   This was logged 
> >> just prior to where it would have panicked:
> >>
> >> groupd[9639]: found uncontrolled kernel object rgmanager in /sys/kernel/dlm
> >> groupd[9639]: local node must be reset to clear 1 uncontrolled instances 
> >> of gfs and/or dlm
> >> openais[9625]: [CMAN ] cman killed by node 1 because we were killed by 
> >> cman_tool or other application
> >> fenced[9647]: cman_init error 0 111
> >> dlm_controld[9653]: cman_init error 0 111
> >> gfs_controld[9659]: cman_init error 111
> >>     
> >
> > These messages mean that the userspace cluster software all exited for
> > some unknown reason, leaving behind a dlm lockspace (in the kernel) from
> > rgmanager.  At this point, you needed to reboot the machine, but instead
> > you restarted the userspace cluster software, which rightly complained
> > that you hadn't rebooted the machine, and refused to do operate.
> >
> > This probably doesn't help, though, because it doesn't tell us anything
> > about the original problem(s) you had.  The original problem(s) probably
> > caused the cluster software to exit the first time, and was probably
> > related to the runaway processes.
> >
> >
> >   
> >> There were 2 runaway processes related to GFS / DLM before I tried to 
> >> shut it down.  We had not encountered any issues like this until now.  
> >> The only changes to our setup were a superficial change to some cluster 
> >> services, and an upgrade of the DRBD kernel module.
> >>
> >> Kevin Anderson wrote:
> >>     
> >>> On Mon, 2007-09-17 at 17:50 -0500, Chris Harms wrote:
> >>>       
> >>>> Is there an easy way to disable GFS and related kernel modules if one 
> >>>> does not need GFS?  We are running the 5.1 Beta 1 version of the cluster 
> >>>> and had a mysterious crash of the cluster suite.  There were issues with 
> >>>> the GFS and dlm modules.  The kernel panicked on shutdown.
> >>>>
> >>>>    
> >>>>         
> >>> Do you have any details on the panic?
> >>>
> >>> Kevin
> >>> ------------------------------------------------------------------------
> >>>
> >>> --
> >>> Linux-cluster mailing list
> >>> Linux-cluster at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>>       
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster at redhat.com
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>     
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster