[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Cluster-devel] [RFC] Common cluster connection handler API


it's been sometime now that is issue has been bothering me.

Looking at the code around, i have seen way too many different methods to connect to libccs and cman and most of them don't do it right.

The actual implementations suffer of a set of race conditions at startup time and have different behaviour across daemons/subsystems that leads to a confusing and dangerous way of starting the software.

What I see now:

- loops on ccs_connect/ccs_force_connect in blocking mode.
- loops on cman_init/cman_admin_init
- poor information to the users on what the daemon is waiting for
- loops to know when cman is actually available.
- attempts to call ccs_connect and exit on failure (clashes with other daemons waiting for ccs) - attempts to call ccs_init and exit on failure (clashes with other daemons waiting for cman) - use of cman_get_node to verify if cman has completed its configuration/startup when there is cman_is_active that fits exacly the same purpose.

As you can see this is not exactly the ideal solution.

So my suggestion boils down to a very simple API that will take care of connecting to libccs and cman and guarantee feedback to user on what we are waiting and guarantee that the application connecting, will have access to ccs/cman when it's the right time to do so.

I don't have a strong opinion on how this API should look like, but this is just one suggestion merely based on the most important bits of booting a cluster:

int cluster_connect(
	char *subsytem_name,
	int *ccsfd,
	cman_handle_t *ch,
	int cman_admin,
	int wait_quorum,
	int blocking,
	int max_attempts)

char *subsytem_name will be used to report to the users (logging or stderr) that this subsystem is waiting for ccs/cman/cman_is_active.

ccsfd and ch will hold the usual suspects.

cman_admin = 0 normal cman_init, 1 cman_admin_init.

wait_quorum = 0 just go ahead, 1 loop also on cman_is_quorated.

blocking = set to 1 if you want wait for the stuff to appear, 0 if you want a one time shot

max_attempts = (useful only in blocking mode), 0 loop forever, > 0 loop N times.

the flow will look like:

connect to ccs
connect to cman
wait for cman_is_active
return 0 on success, < 0 otherwise if we fail to connect within max_attempts or blocking is set to 0.. or any other error for the matter.


I'm going to make him an offer he can't refuse.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]