[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Freeze with cluster-2.03.11



Hi,

Freshly built cluster-2.03.11 reproducibly freezes as mailman started. 
The versions are:

linux-2.6.27.21
cluster-2.03.11
openais from svn, subrev 1152 version 0.80
LVM2.2.02.44

This is a five node cluster wich was just upgraded from cluster-2.01.00, 
node by node. All nodes went fine except when the last one, which runs the 
mailman queue manager was upgraded: after the upgrade as the manager is 
started, the system freezes completely. No error message in the screen or 
in the kernel log. The system responds to ping, that's all, but nothing 
can be done at the console except rebooting. Usually when this node is 
fenced off, shortly after the fencing node freezes as well. What I could 
find in the kernel log of this second machine is as follows:

Mar 26 23:09:24 lxserv1 kernel: dlm: closing connection to node 1
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Trying to 
acquire journal lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Trying 
to acquire journal lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Looking at 
journal...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Looking 
at journal...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: 
Acquiring the transaction lock...
Mar 26 23:09:25 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Acquiring 
the transaction lock...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: 
Replaying journal...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Replaying 
journal...
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Replayed 65 
of 85 blocks
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: replays = 
65, skips = 12, sames = 8
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Replayed 
888 of 994 blocks
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: replays 
= 888, skips = 66, sames = 40
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:home.1: jid=3: Journal 
replayed in 1s
Mar 26 23:09:26 lxserv1 kernel: GFS: fsid=kfki:services.1: jid=3: Done

Does it indicate anything, which could help to fix the cluster?

Best regards,
Jozsef
--
E-mail : kadlec mail kfki hu, kadlec blackhole kfki hu
PGP key: http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address: KFKI Research Institute for Particle and Nuclear Physics
         H-1525 Budapest 114, POB. 49, Hungary


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]