[Linux-cluster] DLM locks with 1 node on 2 node cluster

Mon Aug 28 19:33:38 UTC 2006

On Mon, Aug 28, 2006 at 03:18:24PM -0400, Zelikov_Mikhail at emc.com wrote:
> While the node is down (bof226) I do fence_ack_manual -n bof226. I start
> getting the following messages in the /var/log/messages:
> 
> Aug 28 15:08:30 bof227 fence_manual: Node bof226 needs to be reset
> before recovery can procede.  Waiting for bof226 to rejoin the cluster
> or for manual acknowledgement that it has been reset (i.e.
> fence_ack_manual -n bof226)
> Aug 28 15:10:33 bof227 ccsd[2433]: process_get: Invalid connection
> descriptor received.
> Aug 28 15:10:33 bof227 ccsd[2433]: Error while processing get: Invalid
> request descriptor
> Aug 28 15:10:33 bof227 fenced[2497]: fence "bof226" failed

Strange bug you've found, I've not seen those ccsd errors before.
fence_manual doesn't use ccs, so I'm not sure how that's getting involved.

> >>> Is there a special reason you're using both gnbd and manual fencing?
> I've never seen that done before and can't think of a reason you'd want
> to.

> I was under impression that if there is no hw fencing device then the
> manual one is required. It was also my understanding that if I use gnbd
> devices then an explicit gnbd fencing is required as well.

If you're using gnbd you have three separate options for fencing:

- fence_gnbd, or
- hardware fencing, or
- fence_manual

All three of them are independent of each other, none need to be combined,
just pick one.  We only recommend fence_manual when experimenting.
fence_gnbd is a perfectly legitimate alternative to hw fencing.

Dave