[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Using cman,etc for a non-gfs app

On Thu, 2005-06-23 at 11:23 -0400, Olivier Crete wrote:

> > If you use CMAN's service manager, you will be able to tell if the app
> > has crashed (all nodes in that service group will be notified of the
> > state change).
> In the RHEL4 branch there does not seem to be a userspace API for the
> Service manager.. apart from the ioctl and libmagma. Is libmagma your
> long term api ? Also, can libmagma be used in non-GPL apps? I saw some
> scary comments in magmamsg.h... 


libmagma is LGPL.  The theory is that you could write any app and link
it dynamically against it.  Furthermore, the idea was that, on the
back-end, you could make it load a non-Free plugin to talk to a non-Free
cluster infrastructure.

libmagmamsg is GPL, due to having code chunks from an older GPL project.
The code is quite awful, but it works.  I hope it gets replaced with an
nice cluster-agnostic message system at some point which can be used for
more than just cluster stuff.

> > Internal deadlocks are harder to detect from the cluster infrastructure
> > perspective.  I'd consider using the kernel watchdog timer.
> An easy way would be to have a cluster watchdog (ie.. the app must
> "ping" the cman daemon at least one in X seconds and if it isnt its
> considered deadlocked..)

As it stands now, kernel-mode cman doesn't have this kind of capability.
I could be mistaken, of course.

> > First off:  Generally, an application crashing shouldn't generally cause
> > an eviction of the node from the cluster.  There should be other
> > cleanup/coordination mechanisms in place.  Ok, that said:
> Our application uses semi-shared storage, and if it crashes.. it may
> leave it in an unknown state.. and the easiest way is just to reboot the
> machine and have another machine take over the storage..

I'd definitely wire watchdog timer stuff in to your app.  It solves both
the "crash" and "hang" cases.

> > * With libgulm, you can register as an "important" service: "If this
> > process dies, evict & fence me."
> But gulm is going away, right ?

Maybe ;)

> > The other caveat was that you didn't want to be controlled by resource
> > scripts / managers, right?
> Ideally, I'd want to reduce the amount of forking... Especially when a critical event happens. 


-- Lon

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]