[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Hang on start fence_tool join with qdisk




Hi,

Without qdisk (with two_node=1) cluster works fine. But I need qdisk for latest transition from 2 -> 3 nodes without cluster restart.
Currently I built test cluster with the same hardware and reproduce this problem.
Messages from dlm occurred from time to time, often I have no messages after "Quorum formed, starting"
If I set clean_start=1, fencing start fine, but I still lock on access to cman_admin socket.

So, if you have any suggestions or new devel. pakages for testing , I can install it and gather debug information.
I can open official ticket for that, but since I installed unsupported packages this maybe wrong way :)
But, without your unofficial packages I still have non-working qdisk and ccs_tool update ...

PS I already tried to install new kernel from http://people.redhat.com/dzickus/el5/36.el5/x86_64/   (that contain many fixes in DLM) but without luck...


--
Eugene Melnichuk
Leading Engineer
email: doc umc ua
mob: +380503304043
pbx: +380501105731
CJSC Ukrainian Mobile Communications
49/2 Pobedy ave., room 4.26, 03680, Kyiv, Ukraine



Lon Hohberger пишет:
On Mon, Jul 23, 2007 at 04:37:42PM +0300, Eugene Melnichuk wrote:
  
I have problem with my cluster running on RHEL5 + updates from  
http://people.redhat.com/lhh/rhel5-test/  

I have 2 node cluster with shared quorum disk, qdiskd is running, but 
when I start service cman I hang on Starting fencing.
In my logs I have messages about regained qourum :

Jul 21 15:50:18 arf-web1 qdiskd[7326]: <info> Assuming master role
Jul 21 15:50:19 arf-web1 ccsd[8188]: Cluster is not quorate.  Refusing 
connection.
Jul 21 15:50:19 arf-web1 ccsd[8188]: Error while processing connect: 
Connection refused
Jul 21 15:50:19 arf-web1 openais[8200]: [CMAN ] quorum regained, 
resuming activity
Jul 21 15:50:20 arf-web1 clurgmgrd[7746]: <notice> Quorum formed, starting
Jul 21 15:50:20 arf-web1 kernel: dlm: no local IP address has been set
Jul 21 15:50:20 arf-web1 kernel: dlm: cannot start dlm lowcomms -12
    

The cause here is probably the problem.  Does this happen without qdisk?
I don't understand why qdisk would cause this.

  


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]