Could you please tell me whether my problem with "cluster service restarting localy" is fixed in RHCSU7.
Thanks in advance
On Mon, 13 Mar 2006 Lon Hohberger wrote :
>On Sat, 2006-03-11 at 10:50 +0000, saju john wrote:
> > Dear Mr. Hohberger,
> > Thanx for the replay.
> > I saw your comments for the problem I reported. ie lock traffic is
> > getting network-starved.
>It could be getting I/O starved too, which might explain more given that
>this seems to happen on one node. When running just one node and the
>service restarts, are the symptoms the same? Does it report these kinds
>of errors, or are they different?
>[quote from your previous mail]
>clusvcmgrd: <err> Unable to obtain cluster lock: Connection
>clulockd: <warning> Denied A.B.C.D: Broken pipe
>clulockd: <err> select error: Broken pipe
>If they're different in the one-node case, what are the errors? Also,
>are there any other errors in the logs?
> > My assumption is that, the problem is due to some curruption of meta
> > data information writing to the quroum partition ,as both nodes
> > writing to quroum cuncurrently.
>I really doubt that. In the case of lock information, only one node
>writes at a time anyway...
> > May be due to bug in the rawdeivce driver.I am not sure.Then
> > interesting question is ,how the cluster worked all these days(for me
> > around one year with out any major problem).
>The odds of random, block-level corruption going undetected when reading
> from the raw partitions is low - between (2^32):1 and (2^96):1 against
>per block, based on internal consistency checks that clumanager
>performs. My math might be a little off, but it requires two randomly
>correct 32-bit magic numbers and one randomly valid 32-bit CRC, with
>other data incorrect to cause a problem.
>Specifically in the lock case, a lock block which passed all of the
>consistency checks but was *actually* corrupt would almost always cause
>clulockd to crash.
>Timeout errors mean that clulockd didn't respond to a request in a given
>amount of time, and can be caused by either network saturation or poor
>raw I/O performance to shared storage. It looks like it's getting to an
>incoming request too late...
> > Could you pelase consider this also when releasing the RHCS3U7.
>If this is a critical issue for you, then you should file a ticket with
>Red Hat Support if you have not already done so:
>If you think this is a bug, you can also file a Bugzilla, and we will
>get to it when we can: