[Linux-cluster] RE: [RFC] Generic Kernel API

Thu Oct 7 01:18:57 UTC 2004

On Tue, 2004-10-05 at 23:07, David Teigland wrote:
> On Tue, Oct 05, 2004 at 02:39:44PM -0700, Steven Dake wrote:
> 
> > if its possible people will do it..  
> 
> I think that's the point -- you can if you want.
> 
> > This leads to the problem that then the APIs cannot be trusted to
> > deliver certain guarantees such as agreed ordering of virtual synchrony.
> > If the APIs cannot be trusted to deliver, for example, agreed ordering,
> > then nobody will use agreed ordering and we will have a mess of
> > two-phase commit protocols on our hands..  Or worse, systems will not
> > operate correctly in a distrubted fashion.
> 
> Here's the point of this whole exercise:  to allow /multiple/ varieties of
> cluster manager to live behind one API.  Some cm's, like yours, would take
> the virtual synchrony approach with everything that implies.  Other cm's
> may be intended for something else, not require the guarantees yours
> provides, and take a different course.
> 

The term cluster manager is wrong in this context; the term implies that
the cluster is "managed" which in fact, it is not.  What we are talking
about are low level services available to cluster applications.  Those
applications could be gfs, ssi, ais, or whatever.

I had always believed the point of this exercise was to come up with one
API that works for everyone that is not overly complicated (because it
is generic).  The absolutely most critical service required of a cluster
is communication.  With communication, everything else can be built on
top.

I thought we were in agreement that it is undesireable to have 3-4-6
different APIs with 3-4-6 different implementations of protocols because
the protocols provide different guarantees.

I think it is important we specify up front what the set of guarantees
EVERY (not just some as you suggest) messaging infrastructure should
provide.  The alternative is to really not help anyone, since a generic
api that leaves many things unspecified will not be strict enough for
most users, but may work for some users.  In the end there may be 3-4-6
messaging services behind 1 api that 3-4-6 projects all use seperate
messaging services where nothing interoperates.  If this happens there
is zero point to a common api.

> I thought the whole motivation was to allow for many cm implementations,
> each with their own unique characteristics, but all exporting their
> function through a common kernel API.  (Although I wasn't there, I heard
> "everyone else" agreed this was necessary.)
> 
> If there's only one kernel cm (one that uses VS as you suggest), then
> there's no point in pursuing this "common API" idea -- there would just be
> "the API" exported by "the cm".  This explains why we seem completely out
> of sync in this discussion.
> 
> You obviously don't see any need for different cm implementations (and
> think it's a bad idea.)  In theory you may be right, but I think this is
> mainly a practical question right now.  Someone else could probably
> produce some technical reasons why your CM isn't what they want --
> clustering is a pretty broad field and saying there's "only one right way"
> is a bit bold.  (In theory we only need one local file system after all.)
> In fact, I've thought there may be enough variation among cm's that
> sharing an API would be impossible -- I'm still not sure.
> 
> With cm's providing different functions and behavior, an "application"
> would obviously need to select a specific cm by name (each implementation
> has a unique name) to attach to and use.
> 

This puts us in the same boat we have before one common api; there are
still several implementations with one programming API...  But how does
that help?  If I have to select the messaging protocol I want to use,
then why not just call the APIs?  The idea that a higher level
application could interchange different protocols is unlikely, unless
the protocol delivers a least common denominator guarantee set.

I propose the LCD set is virtual synchrony.

> 
> > virtual synchrony requires membership and messaging to be integrated to
> > deliver on its model.  I can't think of more exotic membership systems
> > except perhaps intergroup.
> 
> Here's an example:  the lowest level cm provides basic
> membership/messaging.  It considers any node a member as long as it can
> communicate with it.  Now say there's a higher level membership system
> built above this that has a more restrictive policy on who can be a
> member.  It takes the membership info from the lower level cm, removes the
> members that don't meet its criteria and exports that new list as the
> members.  An application would have to be written, of course, to interface
> with one of the two cm's depending on what it needs.
> 

this model doesn't make any sense.  Either a processor is contained in a
configuration or it is not.  What would be the use case of restricting
membership to a configuration?  If you did this, then the messaging
layer would have to now know to send only to the restricted list.. 
Sounds complicated with no good benefit that can't already be provided
by some simplier mechanism in virtual synchrony.

> [This concept of layering additional features is one aspect of the common
> API idea that I'm not emphasizing as much as the basic concept of
> alternative implemenatations.]
> 
> 
> > > I think the question here is whether your messaging/membership system
> > > (currently in user space) would fit behind the API Patrick sent once
> > > ported to the kernel.  If not, then what needs to be changed so it would?
> > > The idea is for the API to be general enough to support a variety of
> > > clustering modules, including yours.
> > 
> > Virtual synchrony is the "one true model" for distributed computing. 
> 
> That may be, and if it's true I don't imagine any other cm's will exist
> behind this API in the long term.  The question was, will this API
> adequately export whatever your cm provides?
> 

i think in terms of messaging, if we underspecify by choosing an API
that doesn't have strong requirements, we allow more implementations
with more variation.  This on the surface doesn't sound bad, except that
then it removes the ability to replace one implementation of the api
with another easily.

> 
> > Other systems just don't deliver the features that are available in
> > virtual synchrony.  
> 
> They may not deliver those features by choice simply because the features
> aren't necessary for what they're designed to do.
> 
> 
> > This allows us the freedom to design any sort of distributed system if
> > we accept virtual synchrony must exist at the lowest level.
> 
> We're aiming for even more freedom -- the ability to reject even a
> VS-based cm.  If that's a foolish idea, then alternatives will either die
> or never sprout up.  From what I've heard, there's not a consensus on one
> true cm everyone will adopt.  I think it's unlikely to happen any time
> soon which means we need to allow for different approaches.
> 
> 
> > If virtual synchrony is not enforced by the API, then people that don't
> > care about virtual synchrony immediately could provide implementations
> > that don't support those features.  This would result in fragmented
> > implementations of clustering infrastructure which is what we are trying
> > to avoid.
> 
> As I said earlier, I thought the whole point of this was to allow for
> fragmentation but to agree on an API if possible.  If that's the case,
> then the API should probably be as permissive as possible.
> 

no i disagree with that goal.  Fragmentation delivers absolutely no
advantage to anyone if every particular project implements their own
group messaging system.  It makes Linux more difficult to use, more
difficult to configure, and less performant.

> 
> > Not only that, these solutions would not be reliable in partitions,
> > merges, or faults because they would most likely not handle these
> > situations in a deterministic and correct fashion.  IMHO, it is
> > impossible to make a reliable distributed system if partitions, merges,
> > and faults are not addressed up front as part of the APIs and protocols.
> 
> Some people may not be as interested in reliability as you and I are.
> 
> I understand what you're saying and I think we'd like to use pretty much
> the same kernel cm in the end.  The common kernel API isn't really about
> what /we/ want, though, it's about what other people might want to do.  If
> it's possible to share a common API despite a diversity of implementations
> that would be nice -- at least that's the basis of this discussion.
> 
> Your goal (to get everyone to agree on a single kernel cm) might be
> possible, but it'll probably take a bit of work.  Getting everyone to
> share a common API would at least be a step in that direction.

First of all, if someone is doing clustering for high availability,
there are mostly interested in how the system behaves during partitions,
merges, faults.  If they are doing for performance reasons, they will
not use whatever cluster infrastructure we come up with, because it is
likely not to scale to multi-thousand node setups.  Although virtual
synchrony can scale to that, there is no implementation today that
does.  So the main users of a kernel based infrastructure (high
availability clustering) absolutely do require and are interested in
partition merge fault behavior.

A common api is a decent goal to begin with...  As long as it enforces a
virtual synchrony model.  If it doesn't, then openais, for example,
won't be able to use it.  Maybe linux-ci could use it.  Of course, being
self-centered engineer I am :) I'd like openais to be able to benefit
from any kernel work for clustering.

Thanks
-steve