[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE: [Linux-cluster] Problems with cluster (fencing?)



Nothing to say for the first part,

but this:

""Also, if I shut down both nodes and start just one of them, the starting node still waits in the "starting fencing" part many minutes even though the cluster should be quorate (there's a quorum disk)!
""

I had a similar situation and the reason why the first node couldn't get up alone was, that cman was starting before qdiskd and so it didn't see quorum disk votes and was not quorate at the moment. I changed those (boot order) vice-versa and immediately node boots up ok and is up'n running...

-hjp



-----Original Message-----
From: linux-cluster-bounces redhat com on behalf of Hunt, Gary
Sent: Wed 3/18/2009 22:47
To: linux clustering
Subject: RE: [Linux-cluster] Problems with cluster (fencing?)
 
I was fighting a very similar issue today.  I am not familiar with the fencing you are using, but I would guess your fence device is not working properly.  If a node fails and the fencing doesn't succeed it will halt all gfs activity.  If a clustat shows both nodes and the quorum disk online, but no rgmanager try running a fence_tool leave and fence_tool join on both nodes.  That worked for me today.

Starting one node with the other node down is failing because it is trying to fence all nodes not present before proceeding.  I am testing clean_start="1" in the cluster.conf.  It has worked well so far.  I would definitely read the man page for fenced about clean_start before using it.  It does have some risks.

Gary

From: linux-cluster-bounces redhat com [mailto:linux-cluster-bounces redhat com] On Behalf Of Mikko Partio
Sent: Wednesday, March 18, 2009 2:43 AM
To: linux clustering
Subject: [Linux-cluster] Problems with cluster (fencing?)

Hello all

I have a two-node cluster with a quorum disk.

When I pull off the power cord from one node, the other node freezes the shared gfs-volumes and all activity stops, even though the cluster maintains quorum. When the other node boots up, I can see that "starting fencing" takes many minutes and afterwards starting clvmd fails. That node therefore cannot mount gfs disks since the underlying lvm volumes are missing.

Also, if I shut down both nodes and start just one of them, the starting node still waits in the "starting fencing" part many minutes even though the cluster should be quorate (there's a quorum disk)!

Fencing method used is HP iLO 2. I don't remember seeing this in CentOS 5.1 (now running 5.2). Any clue what might cause this?

Regards

Mikko

________________________________
IMPORTANT NOTICE: This e-mail message and all attachments, if any, may contain confidential and privileged material and are intended only for the person or entity to which the message is addressed. If you are not an intended recipient, you are hereby notified that any use, dissemination, distribution, disclosure, or copying of this information is unauthorized and strictly prohibited. If you have received this communication in error, please contact the sender immediately by reply e-mail, and destroy all copies of the original message.

<<winmail.dat>>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]