[Linux-cluster] Freeze with cluster-2.03.11
Ben Yarwood
ben.yarwood at juno.co.uk
Fri Mar 27 02:21:01 UTC 2009
Replaying a journal as below usually idicates a node has withdrawn from that
file system I believe. You should grep messages on all nodes for 'GFS', if
any node is repoting errors with this fs then it will need rebooting/fencing
before access to that fs can be achieved.
Ben
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kadlecsik Jozsef
Sent: 26 March 2009 22:47
To: linux clustering
Subject: [Linux-cluster] Freeze with cluster-2.03.11
Hi,
Freshly built cluster-2.03.11 reproducibly freezes as mailman started.
The versions are:
linux-2.6.27.21
cluster-2.03.11
openais from svn, subrev 1152 version 0.80
LVM2.2.02.44
This is a five node cluster wich was just upgraded from cluster-2.01.00,
node by node. All nodes went fine except when the last one, which runs the
mailman queue manager was upgraded: after the upgrade as the manager is
started, the system freezes completely. No error message in the screen or
in the kernel log. The system responds to ping, that's all, but nothing
can be done at the console except rebooting. Usually when this node is
fenced off, shortly after the fencing node freezes as well. What I could
find in the kernel log of this second machine is as follows:
Mar 26 23:09:24 lxserv1 kernel: dlm: closing connection to node 1
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Trying to
acquire journal lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Trying
to acquire journal lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Looking at
journal...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Looking
at journal...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3:
Acquiring the transaction lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Acquiring
the transaction lock...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3:
Replaying journal...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Replaying
journal...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Replayed 65
of 85 blocks
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: replays =
65, skips = 12, sames = 8
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Replayed
888 of 994 blocks
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: replays
= 888, skips = 66, sames = 40
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Journal
replayed in 1s
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Done
Does it indicate anything, which could help to fix the cluster?
Best regards,
Jozsef
--
E-mail : kadlec at mail.kfki.hu, kadlec at blackhole.kfki.hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
H-1525 Budapest 114, POB. 49, Hungary
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list