[Linux-cluster] RHCS 3 "could not connect to service manager"

Karl Podesta kpodesta at redbrick.dcu.ie
Wed Nov 15 17:29:53 UTC 2006


On Wed, Nov 15, 2006 at 12:11:00PM -0500, Lon Hohberger wrote:
> > Does this sound familiar to anyone? Has anyone encoutered anything like
> > this in their experience? 
> 
> It doesn't sound familiar, but the easiest thing to do now is to first
> try *without* bonding, then try again with it (both nodes in
> active-backup mode).
> 
> -- Lon

Thanks for the advice Lon - unfortunately the system is in production so 
we'll have to wait for an outage window before we can try it. However we
have tried to simulate the setup using VMWare, and with one node using
load balancing for bonding, we can reproduce the error ("msg-open: 
connection timed out. Could not connect to service manager") by disabling
one of the NICs in the bond and trying to relocate the service. 

When we do this, 50% of packets get through (i.e. load balancing is working
and we can ping the other node), but the service fails to relocate with the
above error. When we have both NICs enabled, 100% of packets get through, 
and service relocation works fine. So this seems to establish that network
activity/problems can disrupt the relocation of services if one of the nodes 
is using load balancing on it's network bonding. Sound reasonable?

We'll wait for an opportunity in the next few days to apply active-backup
to the bonding, but if anyone has any other musings in the meantime it would
be great to hear them of course. Thanks a lot!

Karl

-- 
Karl Podesta
Systems Engineer, Securelinx Ltd.
http://www.securelinx.com/




More information about the Linux-cluster mailing list