[Linux-cluster] Inconsistent cluster view, shutdown, kernel panic
Patrick Caulfield
pcaulfie at redhat.com
Tue May 30 06:57:48 UTC 2006
Moreno Baricevic wrote:
>
> Hello,
>
> we are trying to install GFS (cluster-1.02 on vanilla 2.6.16.16) on a
> CentOS cluster of 70 "diskless" nodes.
>
> The structure is something like this:
>
> +---+ GNBD-SERVERS GNBD CLIENTS
> | |-----[node63]-----[node64 node65 node66 node67 node68 node69]
> | S |.....
> | A |.....
> | N |-----[node07]-----[node08 node09 node10 node11 node12 node13]
> | |-----[node00]-----[node01 node02 node03 node04 node05 node06]
> +---+
>
> All the nodes have a gigabit NIC and all the nodes see each other.
> Only the gnbd-servers have a fiber adapter to connect to the SAN.
>
> Everything works fine as far as we test on 33 nodes: 9 nodes with the
> fiber adapter (acting as both GFS nodes and gnbd-servers) and 24 gnbd
> clients (connected to 4 of the gnbd-servers). "Fine" means that we have
> been able to mount and use the GFS filesystem.
>
> When we try to start cman on 39 nodes (or worst, when we try with 63
> nodes), more or less half of the nodes soon get this:
>
> "kernel panic - not syncing: membership stopped responding"
>
> We tried to increase CMAN_CLUSTER_TIMEOUT and CMAN_QUORUM_TIMEOUT
> (/etc/init.d/cman), but the problem persists.
>
> We tried to boot the nodes 10 at once, with a 2 minutes delay between
> groups. As soon as we reach the quorum (or one of the timeout?) the
> nodes start collapsing due to "Inconsistent cluster view", "Shutdown",
> "No response to messages".
>
> We also tried the patch supplied as solution for the bug report 187777,
> but nothing changes.
>
> Is there a limit on the number of nodes, a timeout, or any other issue
> that we didn't consider?
To be honest, cman has never been tested beyond 32 nodes to my knowledge. for
large clusters you may well be better off using gulm - at least in the short-term.
> Here you can find the cluster.conf, logs from survived and dead nodes,
> tcpdump for UDP:6809, nodes' /proc/cluster/{status,nodes,services}:
>
> http://www.democritos.it/~baro/gfs-test/
>
> There's a lot of stuff, let me know if you need something more specific.
I'll have a look through those logs...but it may take me some time !
Thanks,
Patrick
--
patrick
More information about the Linux-cluster
mailing list