[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS + DRBD Problems



On Tue, 4 Mar 2008, Marc Grimme wrote:

<fenceackserver user    = "root"
                passwd  = "password"
/>

[...]

Now you could do a telnet on the hung node on port 12242 login and
should automatically see, if it is in manual fencing state or not.

Hmm, this doesn't seem to be responding. Is there a separate package
that needs to be installed to add this feature? I've not seen mkinitrd
moan about missing files, so I just assumed it was all there.

Check for comoonics-bootimage-fenceacksv that's the software that must
be installed and rebuild an initrd.

OK. Is there anything else that needs to be tone to enable it, or will
it "just work" if it is specified, as above, in cluster.conf?

It should. You will be able to start/stop it with the
/etc/init.d/fenceacksv initscript. Then try to telnet and type help. It
SHOULD tell you all you need.

Ah, OK, that would be the problem, then. I disabled the init script in the
default run level. I thought it would get started up by the initrd in the
host root, not by the init script in the guest root.

It doesn't seem to start up correctly, though. It says "OK", but I cannot
see it running afterwards, and the machine isn't responding on port 12242
on any of it's 3 interfaces (cluster, external, loopback).

Hmm what does your cluster.conf say?

The <fenceackserver> section is above. The entire cluster.conf file was pasted here a couple of messages ago on this thread.

One other thing I noticed is that my clock seems to be consistently 5
hours out until the first ntpd sync. Could this be related? Could it be
that the clock jump of this magnitude is confusing dlm?

Oh yes. I had this problem very recently. This yields a very odd behaviour as
the fist node to join the cluster got frozen (no cluster process running any
more) and only the second one comes up.

That seems to tally up with my findings, too, except in my case, the single node on it's own gets confused without the other node ever joining.

Now that I synced up the clock to within a few seconds in BIOS, ntp's syncing no longer breaks things by getting them stuck. Processes no longer get stuck in disk sleep state waiting for gdlm_plock to return.

Is this a known bug in dlm? If it isn't, how do I go about filing it?

Gordan


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]