[Linux-cluster] Node Failure Detection Problems

Mon Mar 20 22:33:48 UTC 2006

Benjamin Marzinski wrote:
> Are you exporting the gnbds in clustered or unclustered mode (with the -c
> option or not)? In uncached, you should be able to run "gnbd_import -Or <gnbd>"
> It wont actually remove the device if it is opened, but it should cause
> all the pending IOs to fail.  

Hi Ben - I am running uncached (NOT using -c)

> In uncached mode, after your timeout, all the
> IOs should get flushed assuming that gnbd can fence the server. 
> If this
> isn't happening, can you please send me a more complete description of your
> gnbd setup and problem, including the result of following set of commands, run
> after the server node fails.
> 

We have a high-availability server pair, and need a common storage pool 
and require no single point of failure, so we don't like any of the SAN 
approaches.

Instead we have created an md device (raid level 1), with one local and 
one gndb imported device.  We are not using multipath, instead each 
server has two bound network devices, connected to different hubs, with 
one

The gnbd import and raid mount is managed as a cluster service, is 
failover between either node, and is only ever running on one node at 
any given moment in time.

We are using DLM locking, and fenced with proprietary fence (although 
fencing is not a vital part of data integrity in our schema)

Failure of the active node is handled well, the services migrate, the 
remaining node mounts a (degraded) md device from it's local disk, and 
cluster operation is maintained.

When the dead node returns, a custom script hot-adds the returning 
gnbd_imported disk, and the md device recovers.

The problem comes when the STANDBY node fails.  The md device does not 
take well to failure of the imported gnbd device.  On failure of the 
standby node, the md mount on the active node just hangs.

So at the moment we have had to write a custom script that checks for 
failure of the node from which gnbd_services are imported.

On detection of failure, our script has to manually fail the md device 
(mdadm --fail), at which point the md devices unfreezes.  Our script 
then hot-removes the device from the md array, for completeness.

I had thought that the gnbd_recvd should have hooked up with CMAN/Magma 
and that the device imported from the failed node should automatically 
fail.

Regards,
James