[Linux-cluster] Nodes leaving and re-joining intermittently

Digimer linux at alteeve.com
Sat Dec 10 22:22:55 UTC 2011


On 12/10/2011 05:00 PM, Matthew Painter wrote:
> The switch was our first thought, but that has been swapped, and while
> we are not having nodes fenced anymore (we were daily), this anomoly
> remains.
> 
> I will ask for those logs and conf on Monday.
> 
> I think it might be worth reinstalling corosync on this box anyway?
> Can't be healthy if it is exiting unclearly. I have has reports of the
> rgmanager dying on this box. (pid file but not running) Could that be
> related?
> 
> Thanks :)

It's impossible to say without knowing your configuration. Please share
the cluster.conf (only obfuscate passwords, please) along with the log
files. The more detail, the better. Versions, distros, network config, etc.

Uninstalling corosync is not likely help. RGManager is something fairly
high up in the stack, so it's not likely the cause either.

Did you configure the timeouts to be very high, by chance? I'm finding
it difficult to fathom how the node can withdraw without being fenced,
short of cleanly stopping the cluster stack. I suspect there is
something important not being said, which the configuration information,
versions and logs will hopefully expose.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron




More information about the Linux-cluster mailing list