[Linux-cluster] Dependency graphs, order of startup and failover handling
Lon Hohberger
lhh at redhat.com
Fri Dec 8 18:08:47 UTC 2006
So, the short answer to your question is, "Well, not yet, but, er, funny
that you should ask ..."
On Thu, 2006-12-07 at 22:11 +0200, Eftychios Eftychiou wrote:
> Got a few questions regarding RHCS. Would be grateful if someone could
> answer them.
>
> 1. Fencing in a two node cluster according to the FAQ could lead to
> each node trying to fence each other resulting in total cluster
> blowup. The suggested solution of that problem was to use serialized
> fencing devices. From my understanding the above scenario can occur
> because in a two node environment each node has 1 vote so there is no
> way to break the tie. In any case in a total unrelated section it is
> mentioned that a quorum disk can be used to break the tie. Well I am a
> bit confused since in the section it was mentioned that quorum disk
> support was initially removed from RHEL4 but then was added
> optionally. What is the proper way to create a two node cluster ? Is
> quorum disk required if a serialized fence device is used? Will quorum
> disks usage be phased out in the future ? If yes how do you plan to
> address the issues in a 2 node cluster.
>
> 2. I have seen veritas cluster server (VCS) being configured and it
> had the ability to define a dependency graph of the various services
> available. Is such a thing possible on RHCS? Let me be more specific.
Ironically, not... yet. It's a month or two out at the least.
I have basic dependency graph generation and transition-calculation
using a best-first-search in my sandbox, but there are a couple of bugs
I have to fix before I commit it to CVS (and the initial commit will
only be a PoC; it won't be integrated for at least another month or
two).
It allows you to specify:
- require-always: A must be running before B may start, and if A is
restarted, B must restart (note: A moving to the 'failed' state will
cause B to go to the 'stopped' state)
- require-start: A must running before B may start, but if A restarts,
nothing happens to B (e.g. it's smart enough to reconnect). Same about
moving to 'failed' state though.
- colocate-never: when both A and B are running, A must never run on the
same node as B
- colocate-always: when both A and B are running, A must always run on
the same node as B
> 2 node cluster. Only one node is active running all services. 2nd node
> is passive and in case of failure should take over from node 1.
>
> Application X depends on the existence of network Z
> Application Y also depends on Z
> Application U, I and O depend on X
> Application G depends on nothing directly.
This can mostly be handled now, using 'recover' instead of stop/start
cycles and custom resource agents:
<service name="foo">
<ip ... monitor_link="yes"> <!-- monitors the link for
the service -->
<app_x>
<app_u/>
<app_i/>
<app_o/>
</app_x>
<app_y/>
</ip>
</service>
<service name="bar">
<script name="G" />
</service>
> Is is possible to configure the cluster to behave in the following
> manner
> If U or I or O die restart them
> If X dies the stop all applications and relocate them
You mean U, I, and O, right? G doesn't depend on X.
> if Y dies then restart only Y
> if Z dies Relocate all applications to Node 2
... except G ...
> If G dies restart it
> Of course if there are multiple false restarts all applications should
> be moved to node 2
Nothing for 'false starts' is currently in (as seen in RHCS3). That's
probably a "soon" thing;
> The above applications/components should be started in the following
> order
> Z or G ( G does not depend on anything )
> X or Y
> O, U,I
There are actually several start orders given the dependency set you
described, but yes, that's one of them.
> The actual setup is quite more complicated with multiple dependencies.
> I know it is possible to build dependencies in the service definition
> however I do not see how the above can be accomplished.
Well, if you don't craft special resource agents, then it can't
currently be done. Attached is an example cluster config w/
dependencies. Running it through the test tool:
Start service:G on 1 [1]
Start service:Z on 1 [2]
Start service:Y on 1 [3]
Start service:X on 1 [4]
Start service:O on 1 [5]
Start service:U on 1 [6]
Start service:I on 1 [7]
Application of those operations and running it through graphviz (dot)
yields the attached png as a result.
(come on, you gotta run it through graphviz!)
-- Lon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: e.png
Type: image/png
Size: 30462 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061208/1fbaf7ca/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: eftychios.conf
Type: application/xml
Size: 1226 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061208/1fbaf7ca/attachment.wsdl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20061208/1fbaf7ca/attachment.sig>
More information about the Linux-cluster
mailing list