[Linux-cluster] N+M support ?

Sat Feb 13 15:55:29 UTC 2010

On 13 February 2010 14:01, Rafael Micó Miranda <rmicmirregs at gmail.com> wrote:
> Hi Martin,
>
> Al your questions point to an advanced configuration using Failover
> Domains and Service propierties.
>
> El sáb, 13-02-2010 a las 06:14 +0000, Martin Waite escribió:
>> Hi,
>>
>> Suppose I have 3 services running on 5 nodes.  Each node can run only 1 service, 2 nodes are reserved for failover.
>
> Lets name the services S1, S2, S3, and the nodes N1, N2, N3 and N4, N5
> for the failover ones.
>
>>
>> It is easy to configure rgmanager to cope with the first service node failure by including the 2 failover nodes in the failover domain for each service.
>
> Yes, you can configure it in the following way (name the failover
> domains F1, F2, F3)
>
> F1: service S1, nodes N1, N4, N5
> F2: service S2, nodes N2, N4, N5
> F3: service S3, nodes N3, N4, N5
>
> You'll need to set the failover domains with the properties "restricted"
> and "ordered" for this. There is another property, "auto_failback", that
> will be of your interest. Keep it in mind.
>
>>
>> However, is it possible to configure rgmanager such that on a second failure, only the failover node that is not currently running a service is considered for use ?
>
> Yes. Services have the "run exclusive" option. That property will only
> allow a service to be run on a node that has no running services. With
> this option on, if N1 fails it will fail over the N4. After that if N2
> fails the service S2 will be migrated to N5, and not to N4.
>
>>
>> Further, that if a third failure occurs, the affected service is not migrated at all ?
>
> Yes. If you have set the "run exclusive" option on the three services
> the service will not be migrated to any node according to:
>
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.4/html/Cluster_Administration/s1-add-service-CA.html
>
>
>>
>> Also, is it possible to rank the services such that if the failover nodes are occupied by low ranking services and the node running a higher ranking service fails, that the lowest ranking service is evicted so that the higher ranking service can be failed over ?
>>
>
> As far as I know, there is no direct function to do what you ask.
>
> You can play with all this options to get something similar to what you
> need. For example I propose you this configuration:
>
> Nodes:
>
> N1, N2, N3: service nodes
> N4, N5: failover nodes
>
> Services:
>
> S1: high ranking service. "Run exclusive" on.
> S2, S3: low ranking services. "Run exclusive" off.
>
> Fail Over Domains:
>
> F1: service S1. Nodes N1, N4, N5. "Restricted" on. "Ordered on".
> "Auto_failback" off.
> F2: service S2. Nodes N2, N5, N4. "Restricted" on. "Ordered on".
> "Auto_failback" on.
> F3: service S3. Nodes N3, N5, N4. "Restricted" on. "Ordered on".
> "Auto_failback" on.
>
> With this configuration you only penalise services S2 and S3 in case of
> N2 and N3 failure because both services will run on the same node, but
> you keep N4 free for your hing ranking S1 service. With "Auto_failback"
> on on faiolver domains F2 and F3 you will automatically migrate S2 or S3
> back to its preferred node when they come back alive, so penalisation
> will be shorter in time.
>
>
> Remember that you are talking about a failure of up to 3 nodes in a
> cluster of 5 members. Maybe there is no sense in this because depending
> on the configuration given you can even lose Quorum before achieving
> this situation.
>
>
>> regards,
>> Martin
>>
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> Cheers,
>
> Rafael
>
> --
> Rafael Micó Miranda
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Hi Rafael,

Thank you very much for the information.  The "run exclusive" option does appear to do what I need.  As for the service ranking and eviction scenario - you are correct that this only becomes necessary in the event of multiple failures, and perhaps I don't need to go that far.   The easiest solution is to increase the "M" in the "N+M" set of nodes:  I can survive one failure where M=1, two where M=2, etc.  If more nodes fail than can be failed over, you are entering DR territory.

regards,
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 7370 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100213/8fef315e/attachment.bin>