[Linux-cluster] Re: [RFC] DRBD + GFS cookbook (Lon Hohberger)

Tue Dec 11 08:42:41 UTC 2007

>> 2) The manged NFS service refuses to failover. I am not sure whether
 this is because of manual fencing. Our APC MasterSwitch is expected
 shortly so >>will know more about NFS failover after we have proper fencing
 setup.
>> I would be very interested in trying this fencing through DRBD..

>I don't like the idea of asking a node who has been evicted from the
>cluster to "stop I/O pretty-please-with-sugar-on-top", but that's just
>my opinion.

Agreed. 

>A simple outdate-peer script could be done using ssh assuming
>distributed keys:

>  ssh <host> -c "drbdadm outdate all"

My presumption here is drbd has lost contact with the peer but the RHCS cluster my still be up & nodes may still be able to talk to each other through the other link. So we try to use that & I/O fence it or STONITH it. Correct?

Does it work as expected? There seem to be two problems to me...

a) when we use something like.. (from your cookbook)
disk {
               fencing resource-and-stonith;
       }
       handlers {
               outdate-peer "/sbin/obliterate"; # We'll get back to this.
       }when this handler gets called both nodes will try to fence each other. Is that the intended effect?

b) If we try to do ssh <host> -c "drbdadm outdate all",  gfs is still mounted on top of drbd and drbd is primary so here is no effect of the command and the split brain continues. I have seen this.

>> Another question may be OT sorry if so. Is there a way to failover
 the diskless nodes to other cluster server in case of one cluster server
 going down?

>I don't fully understand.  You want to start a service on a node which
>doesn't have access to DRBD - but the service depends on DRBD?

We are using a three server node cluster. Two of the server nodes act as the shared storage in Active-Active DRBD. The third
 server node mounts the gfs volumes through a manged NFS service. All three
 cluster nodes act as servers for diskless nodes ( XDMCP through LVS --> Direct Routing method). The diskless nodes are not part of RHCS cluster. They are thin clients for students.
What I was wondering about is if there is a way to switch over a users session in the event of a server cluster node crashing. It wont have to depend on DRBD as the other server node will still be active as drbd primary, also the third server will continue working with NFS failing over to the remaining DRBD machine. 
Currently the user's session freezes and they have to restart the thin client so they get connected to the one of the remaining servers. 

Regards
Koustubha Kale

      Chat on a cool, new interface. No download required. Go to http://in.messenger.yahoo.com/webmessengerpromo.php