[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Unable to connect to cluster infrastructure - cluster died



Hi all,
i had a perfectly working 2-node cluster.

I saw kernel security updates and cluster bugfix update, so i waited 2 weeks and decided, today, to do the updates

I disabled my cluster service (oracle) , patched both machines and rebooted

After reboot i had:

[root lvzbe1 kernel]# clustat
Could not connect to cluster service

and a bunch of
Oct 13 13:51:55 lvzbe2 ccsd[3381]: Unable to connect to cluster infrastructure after 3840 seconds. Oct 13 13:52:26 lvzbe2 ccsd[3381]: Unable to connect to cluster infrastructure after 3870 seconds. Oct 13 13:52:56 lvzbe2 ccsd[3381]: Unable to connect to cluster infrastructure after 3900 seconds. Oct 13 13:53:26 lvzbe2 ccsd[3381]: Unable to connect to cluster infrastructure after 3930 seconds.

Cluster DIED.

I did investigations and i discovered that someone _forgot_ to compile dlm-smp and cman-smp for the latest redhat kernel.

this is the "old" kernel:

[root lvzbe1 kernel]# cd /lib/modules/2.6.9-42.0.2.ELsmp/kernel/
[root lvzbe1 kernel]# ls -la
total 44
drwxr-xr-x  10 root root 4096 Sep  4 10:17 .
drwxr-xr-x   3 root root 4096 Oct 13 12:56 ..
drwxr-xr-x   3 root root 4096 Sep  4 10:17 arch
drwxr-xr-x   2 root root 4096 Oct 13 12:56 cluster
drwxr-xr-x   2 root root 4096 Sep  4 10:17 crypto
drwxr-xr-x  29 root root 4096 Sep  4 10:17 drivers
drwxr-xr-x  22 root root 4096 Sep  4 10:17 fs
drwxr-xr-x   3 root root 4096 Sep  4 10:17 lib
drwxr-xr-x  13 root root 4096 Sep  4 10:17 net
drwxr-xr-x  10 root root 4096 Sep  4 10:17 sound
[root lvzbe1 kernel]#


and this is the "new" one:

root lvzbe1 kernel]# cd /lib/modules/2.6.9-42.0.3.ELsmp/kernel/
[root lvzbe1 kernel]# ls -la
total 36
drwxr-xr-x   9 root root 4096 Oct 13 12:20 .
drwxr-xr-x   3 root root 4096 Oct 13 12:31 ..
drwxr-xr-x   3 root root 4096 Oct 13 12:20 arch
drwxr-xr-x   2 root root 4096 Oct 13 12:20 crypto
drwxr-xr-x  29 root root 4096 Oct 13 12:20 drivers
drwxr-xr-x  22 root root 4096 Oct 13 12:20 fs
drwxr-xr-x   3 root root 4096 Oct 13 12:20 lib
drwxr-xr-x  13 root root 4096 Oct 13 12:20 net
drwxr-xr-x  10 root root 4096 Oct 13 12:20 sound
[root lvzbe1 kernel]#


As you can see, the latest kernel does not have the "cluster" directory.

This is the latest cman:

[root lvzbe1 kernel]# rpm -qil cman-kernel-smp-2.6.9-45.5
Name : cman-kernel-smp Relocations: (not relocatable)
Version     : 2.6.9                             Vendor: Red Hat, Inc.
Release : 45.5 Build Date: Fri 18 Aug 2006 07:05:34 PM CEST Install Date: Fri 13 Oct 2006 12:56:36 PM CEST Build Host: hs20- bc1-3.build.redhat.com Group : System Environment/Kernel Source RPM: cman- kernel-2.6.9-45.5.src.rpm
Size        : 340198                           License: GPL
Signature : DSA/SHA1, Tue 22 Aug 2006 09:51:57 PM CEST, Key ID 219180cddb42a60e
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : cman-kernel-smp - The Cluster Manager kernel smp modules
Description :
cman-kernel-smp - The Cluster Manager kernel smp modules
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/cman.ko
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/cman.symvers
[root lvzbe1 kernel]#


and this is the latest dlm:

rpm -qil dlm-kernel-smp-2.6.9-44.2
Name : dlm-kernel-smp Relocations: (not relocatable)
Version     : 2.6.9                             Vendor: Red Hat, Inc.
Release : 44.2 Build Date: Tue 26 Sep 2006 10:49:24 PM CEST Install Date: Fri 13 Oct 2006 12:20:35 PM CEST Build Host: hs20- bc2-3.build.redhat.com Group : System Environment/Kernel Source RPM: dlm- kernel-2.6.9-44.2.src.rpm
Size        : 329858                           License: GPL
Signature : DSA/SHA1, Thu 28 Sep 2006 09:44:31 PM CEST, Key ID 219180cddb42a60e
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary : dlm-kernel-smp - The Distributed Lock Manager kernel modules.
Description :
dlm-kernel-smp - The Distributed Lock Manager kernel-smp modules.
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/dlm.ko
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/dlm.symvers

Luckily this is not (yet) a production system, and i REALLY hope i did something wrong, even if im sure i did not.

Can i download cman-kernel-src.rpm and dlm-kernel.src.rpm and compile myself, while waiting for answers from you ?


Matteo




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]