[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Severe problems with 64-bit RHCS on RHEL5.1

Harri Paivaniemi tietoenator com wrote:

So, this is my sad history with ver 5. Do you use 64-bit ver 5 and what's your feeling?

I only started using it with v5, and I have to say that I haven't had any real problems. Some of my clusters have been 64-bit, some 32-bit, and I haven't seen any differences yet.

My problems this time are:

1. 2-node cluster. Can't start only one node to get cluster services up - it hangs in fencing and waits until I start te second node and immediately after that, when both nodes are starting cman, the cluster comes up. So if I have lost one node, I can't get the cluster up, if I have to restart for seome reason the working node. It should work like before (both nodes are down, I start one, it fences another and comes up). Now it just waits... log says:

ccsd[25272]: Error while processing connect: Connection refused

This is so common error message, that it just tell's nothing to me....

I have seen similar error messages before, and it has usually been caused by either the node names/interfaces/IPs not being listed correctly in /etc/hosts file, or iptables firewalling rules blocking communication between the nodes.

2. qdisk doesn't work. 2- node cluster. Start it (both nodes at the same time) to get it up. Works ok, qdisk works, heuristic works. Everything works. If I stop cluster daemons on one node, that node can't join to cluster anymore without a complete reboot. It joins, another node says ok, the node itself says ok, quorum is registred and heuristic is up, but the node's quorum-disk stays offline and another node says this node is offline. If I reboot this machine, it joins to cluster ok.

I believe it's supposed to work that way. When a node fails it needs to be fully restarted before it is allowed back into the cluster. I'm sure this has been mentioned on the list recently.

3. Funny thing: heuristic ping didn't work at all in the beginning and support gave me a "ping-script" which make it to work... so this describes quite well how experimental this cluster is nowadays...

I have to tell you it is a FACT that basics are ok: fencing works ok in a normal situation, I don't have typos, configs are in sync,  everything is ok, but these problems still exists.

I've been in similar situations before, but in the end it always turned out to be me doing something silly (see above re: host files and iptables as examples).

I have 2 times sent sosreports etc. so RH support. They hava spent 3 weeks and still can't say whats wrong...

Sadly, that seems to be the quality of commercial support from any vendor. Support nowdays seems to have only one purpose - managerial back-covering exercise so they can pass the buck. I have always found that community support is several orders of magnitude better than commercial support in terms of both response speed and quality.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]