Re: [Cluster-devel] fence daemon problems

On Wed, Oct 03, 2012 at 04:12:10PM +0000, Dietmar Maurer wrote:
> > Yes, it's a stateful partition merge, and I think /var/log/messages should have
> > mentioned something about that.  When a node is partitioned from the
> > others (e.g. network disconnected), it has to be cleanly reset before it's
> > allowed back.  "cleanly reset" typically means rebooted.  If it comes back
> > without being reset (e.g. network reconnected), then the others ignore it,
> > which is what you saw.

> What message should I look for?

I was wrong, I was thinking about the "daemon node %d stateful merge"
messages which are debug, but should probably be changed to error.

> I don't really understand why 'dlm_controld' initiates fencing, although
> the node does not has quorum?
> I thought 'dlm_controld' should wait until cluster is quorate before
> starting fence actions?

I guess you're talking about the dlm_tool ls output?  The "fencing" there
means it is waiting for fenced to finish fencing before it starts dlm
recovery.  fenced waits for quorum.

hp2:~# dlm_tool ls
dlm lockspaces
name          rgmanager
id            0x5231f3eb
flags         0x00000004 kern_stop
change        member 3 joined 1 remove 0 failed 0 seq 2,2
members       2 3 4
new change    member 2 joined 0 remove 1 failed 1 seq 3,3
new status    wait_messages 0 wait_condition 1 fencing
new members   3 4

