[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Unable to connect to cluster infrastructure - cluster died



On 10/13/06, Matteo Catanese <m catanese kinetikon com> wrote:
Hi all,
i had a perfectly working 2-node cluster.

I saw kernel security updates and cluster bugfix update, so i waited
2 weeks and decided, today, to do the updates

I disabled my cluster service (oracle) , patched both machines and
rebooted

After reboot i had:

[root lvzbe1 kernel]# clustat
Could not connect to cluster service

and a bunch of
Oct 13 13:51:55 lvzbe2 ccsd[3381]: Unable to connect to cluster
infrastructure after 3840 seconds.
Oct 13 13:52:26 lvzbe2 ccsd[3381]: Unable to connect to cluster
infrastructure after 3870 seconds.
Oct 13 13:52:56 lvzbe2 ccsd[3381]: Unable to connect to cluster
infrastructure after 3900 seconds.
Oct 13 13:53:26 lvzbe2 ccsd[3381]: Unable to connect to cluster
infrastructure after 3930 seconds.

Cluster DIED.

I did investigations and i discovered that someone _forgot_ to
compile dlm-smp and cman-smp for the latest redhat kernel.

this is the "old" kernel:

[root lvzbe1 kernel]# cd /lib/modules/2.6.9-42.0.2.ELsmp/kernel/
[root lvzbe1 kernel]# ls -la
total 44
drwxr-xr-x  10 root root 4096 Sep  4 10:17 .
drwxr-xr-x   3 root root 4096 Oct 13 12:56 ..
drwxr-xr-x   3 root root 4096 Sep  4 10:17 arch
drwxr-xr-x   2 root root 4096 Oct 13 12:56 cluster
drwxr-xr-x   2 root root 4096 Sep  4 10:17 crypto
drwxr-xr-x  29 root root 4096 Sep  4 10:17 drivers
drwxr-xr-x  22 root root 4096 Sep  4 10:17 fs
drwxr-xr-x   3 root root 4096 Sep  4 10:17 lib
drwxr-xr-x  13 root root 4096 Sep  4 10:17 net
drwxr-xr-x  10 root root 4096 Sep  4 10:17 sound
[root lvzbe1 kernel]#


and this is the "new" one:

root lvzbe1 kernel]# cd /lib/modules/2.6.9-42.0.3.ELsmp/kernel/
[root lvzbe1 kernel]# ls -la
total 36
drwxr-xr-x   9 root root 4096 Oct 13 12:20 .
drwxr-xr-x   3 root root 4096 Oct 13 12:31 ..
drwxr-xr-x   3 root root 4096 Oct 13 12:20 arch
drwxr-xr-x   2 root root 4096 Oct 13 12:20 crypto
drwxr-xr-x  29 root root 4096 Oct 13 12:20 drivers
drwxr-xr-x  22 root root 4096 Oct 13 12:20 fs
drwxr-xr-x   3 root root 4096 Oct 13 12:20 lib
drwxr-xr-x  13 root root 4096 Oct 13 12:20 net
drwxr-xr-x  10 root root 4096 Oct 13 12:20 sound
[root lvzbe1 kernel]#


As you can see, the latest kernel does not have the "cluster" directory.

This is the latest cman:

[root lvzbe1 kernel]# rpm -qil cman-kernel-smp-2.6.9-45.5
Name        : cman-kernel-smp              Relocations: (not
relocatable)
Version     : 2.6.9                             Vendor: Red Hat, Inc.
Release     : 45.5                          Build Date: Fri 18 Aug
2006 07:05:34 PM CEST
Install Date: Fri 13 Oct 2006 12:56:36 PM CEST      Build Host: hs20-
bc1-3.build.redhat.com
Group       : System Environment/Kernel     Source RPM: cman-
kernel-2.6.9-45.5.src.rpm
Size        : 340198                           License: GPL
Signature   : DSA/SHA1, Tue 22 Aug 2006 09:51:57 PM CEST, Key ID
219180cddb42a60e
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : cman-kernel-smp - The Cluster Manager kernel smp modules
Description :
cman-kernel-smp - The Cluster Manager kernel smp modules
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/cman.ko
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/cman.symvers
[root lvzbe1 kernel]#


and this is the latest dlm:

rpm -qil dlm-kernel-smp-2.6.9-44.2
Name        : dlm-kernel-smp               Relocations: (not
relocatable)
Version     : 2.6.9                             Vendor: Red Hat, Inc.
Release     : 44.2                          Build Date: Tue 26 Sep
2006 10:49:24 PM CEST
Install Date: Fri 13 Oct 2006 12:20:35 PM CEST      Build Host: hs20-
bc2-3.build.redhat.com
Group       : System Environment/Kernel     Source RPM: dlm-
kernel-2.6.9-44.2.src.rpm
Size        : 329858                           License: GPL
Signature   : DSA/SHA1, Thu 28 Sep 2006 09:44:31 PM CEST, Key ID
219180cddb42a60e
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Summary     : dlm-kernel-smp - The Distributed Lock Manager kernel
modules.
Description :
dlm-kernel-smp - The Distributed Lock Manager kernel-smp modules.
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/dlm.ko
/lib/modules/2.6.9-42.0.2.ELsmp/kernel/cluster/dlm.symvers

Luckily this is not (yet) a production system, and i REALLY hope i
did something wrong, even if im sure i did not.

Can i download cman-kernel-src.rpm and dlm-kernel.src.rpm and compile
myself, while waiting for answers from you ?


Matteo



--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster


The cluster packages are kernel specific and lag behind normal kernel
updates. Not sure if they release cluster updates outside the update
cycle though, I haven't been using them for more than two updates.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]