[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Using cman,etc for a non-gfs app



On Wed, 2005-06-22 at 15:17 -0400, Olivier Crete wrote:
> Hi,
> 
> For the last few days, I have been looking at the cman stack for our
> application. But I have a few questions.
> 
> Our application is asymetric, we have a (duplicated active-passive)
> master server and work nodes. What I need from the cman is to know the
> state of each node and notification when the state changes. The policy
> decision (as to the fail-over, etc) would be taken by our master server.
> >From what I can see, cman/ucman can already do that.
> 
> But, I need to monitor the application (have some kind of application
> heartbeat) so I can know if the app has deadlocked or segfaulted. And
> inform the masters (active/passive) of what happened so they can take
> the proper decision. 

If you use CMAN's service manager, you will be able to tell if the app
has crashed (all nodes in that service group will be notified of the
state change).

Internal deadlocks are harder to detect from the cluster infrastructure
perspective.  I'd consider using the kernel watchdog timer.

> It would also be nice to have a library version of fence, but for now I
> guess I can just system() fence_node (that does not use fenced, right?).
> Or something like stonithd (from the linux-ha folks) where the fencing
> equipement can be connected to different nodes, but be controlled in a
> transparent way. And, I want to retain control from my app...

First off:  Generally, an application crashing shouldn't generally cause
an eviction of the node from the cluster.  There should be other
cleanup/coordination mechanisms in place.  Ok, that said:

* With libgulm, you can register as an "important" service: "If this
process dies, evict & fence me."

* libmagma provides cp_fence() / clu_fence() which work on both CMAN and
gulm.

* You can fork/exec the fence_node command.

> Oh yea, and I need something relatively stable before September too...
> Can I do that with your stuff? 

The other caveat was that you didn't want to be controlled by resource
scripts / managers, right?

-- Lon


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]