[Linux-cluster] Common Cluster Infrastructure discussion

Tue May 31 19:29:34 UTC 2005

On 5/31/05, Lars Marowsky-Bree <lmb at suse.de> wrote:
> On 2005-05-21T09:29:01, Robert Wipfel <rawipfel at novell.com> wrote:
> 
> > outside looking in, is web services and grid.  Returning to the
> > reality of many vendor's enterprise* business, the sweet spot for h/a
> > clusters still seems to be somewhere around ~8 dual-CPU nodes with
> > many customers deploying multiple similar clusters. Nodes are never in
> > multiple clusters at once, rather, individual nodes are members of a
> > cluster and that cluster might be a member of a cluster of clusters.

To restate the proposal, in response to this distinction, that "Nodes
are never in
 multiple clusters at once, rather, individual nodes are members of a
 cluster and that cluster might be a member of a cluster of clusters", in the
language of the CCI proposal, one or more nodes in a subcluster would join
the larger cluster, but not all of them, and these liaison nodes would
handle the
communications between this subcluster and other subclusters.  using
CCI, different
subclusters in this grid could run different clustering frameworks, or might be
lone boxes in the supercluster that aren't actually representing clusters.

To implement a cluster being a member of a cluster of clusters with the same
interface that is used to manage a node being a member of a cluster, that is the
idea.  

I take away from LMB's remark a requirement that the CCI must support in-cluster
selection/election for a presented service, so that the liaison nodes,
which could be
all nodes, or could be a subset of all nodes, in the subcluster, 
could present themselves
as a coherent authority to the other members of the supercluster, over
the channels
defined by the supercluster, representing a single node identifier in
the supercluster.

We want in-cluster communications to be through the CCI rather than through an
implementation detail (such as tcp/ip) because we do not want to
confuse communications
by associating node identification with any artifact, such as IP
address, which could
be broken by an architecture change, or even by a failover.

> A single node must be big enough to support sane load balancing; ie, big
> enough to run at least one (or more) "whole" resource entities / jobs.

OTOH, the grain size of a "whole job" can be tuned to fit the reality
of your hardware.

Hopefully the CCI will provide useful metrics about node capability
and current load
to allow apples-to-apples comparisons for making better load balancing decisions
in heterogenous clusters.

> "Ignorance more frequently begets confidence than does knowledge"

very good - I know it does for me!

David L Nicol
Proudly ignorant of many and much