Re: [Linux-cluster] quorum / hba issues

On Tue, 2006-12-12 at 10:10 -0700, Daryl Fenton wrote:
> Right now we have 2 HP blade servers (Blade1 and Blade3) running redhat 
> AS 4U4 and cluster suite 4, they are both accessing LVMs on our EMC 
> CX700 SAN. Presently we have a 350Gig Ext3 LVM and a 350Gig GFS LVM that 
> they are trying to share using cluster suite and NFS. The following 
> issue is when we are running tests on our Ext3 NFS share. When we take 
> down one of the HBA connections to Blade1 the multipath kicks in and 
> everything works fine, but when we disable all of the HBA connections on 
> Blade1 the quorum then notices that Blade1 can’t access the qdisk and 
> the cluster then fences blade1 which causes it to reboot it’s self. The 
> problem is when blade1 comes back up, it can’t find it’s quorum disk 
> since the hba is down. Since you need cman for the quorum to work cman 
> fires up fine and blade1 joins the cluster. The next service to start is 
> qdsikd which fails since blade1’s hba is down and it can’t see the 
> quorum disk. Once everything is started blade1 tries to get it’s 
> services back from the cluster and fails them since it’s hba is down. 
> And then just sites there in the failed state until manual intervention. 
> Is there a way to get blade1 not to join the cluster since it’s hba is 
> still down, or if it does join the cluster tell it to fence it’s self / 
> not accept any services?

There's a bug open about this; we're still trying to figure out the best
way to handle it without breaking backward compatibility.  I would
expect a (testable) fix to be constructed this week.


-- Lon

