[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Freeze with cluster-2.03.11



On Mon, 30 Mar 2009, David Teigland wrote:

> On Fri, Mar 27, 2009 at 06:19:50PM +0100, Kadlecsik Jozsef wrote:
> > 
> > Combing through the log files I found the following:
> > 
> > Mar 27 13:31:56 lxserv0 fenced[3833]: web1-gfs not a cluster member after 0 sec post_fail_delay
> > Mar 27 13:31:56 lxserv0 fenced[3833]: fencing node "web1-gfs"
> > Mar 27 13:31:56 lxserv0 fenced[3833]: can't get node number for node e1??e1?? 
> > Mar 27 13:31:56 lxserv0 fenced[3833]: fence "web1-gfs" success
> > 
> > The line saying "can't get node number for node e1??e1??" might be 
> > innocent, but looks suspicious. Why fenced could not get the victim name?
> 
> I've not seen that before, and I can't explain either how cman_get_node()
> could have failed or why it printed a garbage string.  It's a non-essential
> bit of code, so that error should not be related to your problem.

Yes, it is surely not related to the freeze, but disturbing.

Hm, in the function dispatch_fence_agent there's an ordering issue, I 
believe. The variable victim_nodename is freed but update_cman is called 
with variable victim pointing to the just freed victim_nodename.

Best regards,
Jozsef
--
E-mail : kadlec mail kfki hu, kadlec blackhole kfki hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]