Without qdisk (with two_node=1) cluster works fine. But I need qdisk for latest transition from 2 -> 3 nodes without cluster restart.
Currently I built test cluster with the same hardware and reproduce this problem.
Messages from dlm occurred from time to time, often I have no messages after "Quorum formed, starting"
If I set clean_start=1, fencing start fine, but I still lock on access to cman_admin socket.
So, if you have any suggestions or new devel. pakages for testing , I can install it and gather debug information.
I can open official ticket for that, but since I installed unsupported packages this maybe wrong way :)
But, without your unofficial packages I still have non-working qdisk and ccs_tool update ...
PS I already tried to install new kernel from http://people.redhat.com/dzickus/el5/36.el5/x86_64/ (that contain many fixes in DLM) but without luck...
email: doc umc ua
CJSC Ukrainian Mobile Communications
49/2 Pobedy ave., room 4.26, 03680, Kyiv, Ukraine
Lon Hohberger пишет:
On Mon, Jul 23, 2007 at 04:37:42PM +0300, Eugene Melnichuk wrote:I have problem with my cluster running on RHEL5 + updates from http://people.redhat.com/lhh/rhel5-test/ I have 2 node cluster with shared quorum disk, qdiskd is running, but when I start service cman I hang on Starting fencing. In my logs I have messages about regained qourum : Jul 21 15:50:18 arf-web1 qdiskd: <info> Assuming master role Jul 21 15:50:19 arf-web1 ccsd: Cluster is not quorate. Refusing connection. Jul 21 15:50:19 arf-web1 ccsd: Error while processing connect: Connection refused Jul 21 15:50:19 arf-web1 openais: [CMAN ] quorum regained, resuming activity Jul 21 15:50:20 arf-web1 clurgmgrd: <notice> Quorum formed, starting Jul 21 15:50:20 arf-web1 kernel: dlm: no local IP address has been set Jul 21 15:50:20 arf-web1 kernel: dlm: cannot start dlm lowcomms -12The cause here is probably the problem. Does this happen without qdisk? I don't understand why qdisk would cause this.