[Linux-cluster] How to disable node?

Jakov Sosic jakov.sosic at srce.hr
Tue Sep 1 10:48:51 UTC 2009


On Tue, 01 Sep 2009 12:29:36 +0200
"Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
wrote:

> It isn't misbehaving at all here.
> 
> The job of RHCS in this case is to save your data against failure.
> 
> If fenced can't fence a node successfully, RHCS will wait in stalled
> mode (because it doesn't get a successful response from the
> fence-agent) until someone who knows what he is doing comes around to
> fix up the problem. If it wouldn't do it that way a separated node
> could eat up your data. It is the job of fenced to stop all
> activities until fencing is in a working shape again.
> 
> This behaviour is perfectly fine IMO...

Isn't that the mission of quorum? For example - if you have qourum you
will run services, if you don't have quorum you won't. If there is a
qdisk and single of three nodes is missing, it can't have quorum - so
it can't run services?

OK I understand that this is the safer way... But that's why I was
asking in the first place for a command to flag node as missing
completely, so that I can avoid all reconfigurations. Reconfiguration
while a node missing will trigger odd behavior when node comes back -
node will be fenced constantly because it has wrong config version.


> - You use system dependent fencing like "HP iLO" wich will be missing
>   if your system is missing and no independent fencing like an
>   APC PowerSwitch...

Yes but that are the only devices I have available for fencing. So that
is the limitation of hardware, on which I don't have any influence in
this case. I already know that fence devices are my only SPOF
currently... But I can't help myself.


>   Think about a power purge which kills booth of your PSU on a system,
>   a system dependent management device would be missing from your
>   network in this case leading to exactly the problem you're faced
> with.

I will take a look if APC UPS-es have something like killpower for
certain ports, if not I will set up false manual fencing to get around
this problem. Thank you.


> Your mistake is that you started fenced in normal mode in which it
> will fence all nodes that it can't reach to get around a possible
> split-brain scenario. You need to start fenced in "clean start"
> without fencing mode (read the fenced manpage as it is documented
> there) because you know everything is right.

Adding clean_start again presumes reconfiguring just like removing a
node and declaring cluster a two_node, and I wanted to avoid
reconfigurations...


Thank you very much.


-- 
|    Jakov Sosic    |    ICQ: 28410271    |   PGP: 0x965CAE2D   |
=================================================================
| start fighting cancer -> http://www.worldcommunitygrid.org/   |




More information about the Linux-cluster mailing list