[Linux-cluster] CS5 / about loop "Node is undead"

Mon Jun 9 09:04:38 UTC 2008

Hi

About my problem of node entering a loop :
Jun  3 15:54:49 s_sys at xn2 qdiskd[22256]: <notice> Writing eviction notice for node 1
Jun  3 15:54:50 s_sys at xn2 qdiskd[22256]: <notice> Node 1 evicted
Jun  3 15:54:51 s_sys at xn2 qdiskd[22256]: <crit> Node 1 is undead.

I notice that just before entering this loop, I have a message :
Jun  3 15:54:47 s_sys at xn2 fenced[22327]: fencing node "xn1"
Jun  3 15:54:48 s_sys at xn2 qdiskd[22256]: <info> Assuming master role

but never the message :
Jun  3 15:54:47 s_sys at xn2 fenced[22327]: fence "xn1" success

Nethertheless, the service of xn1 is well failovered by xn2, but
then after the reboot of xn1, we can't start again the CS5 due
to the problem of infernal loop "Node is undead" on xn2.

whereas when it works correctly, both messages :
fencing node "xn1"
fence "xn1" success
are successive (after about 30s)

So my question is : could this pb of infernal loop "Node is undead"
be systematically due to a failed fencing phase of xn2 towards xn1 ?

PS: note that I have applied patch :
http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9

Thanks
Regards
Alain Moullé