[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] rgmanager is jamed

I am in the process of upgrading one of our cluster from RHEL 6.1 to 6.2. It's an 8-node cluster.

I started with one node. Stop all cluster resources, cman, rgmanager et al. yum update, reboot, move to next. The first one did ok.

On the second one, rgmanager started, but doesn't seem to connect to other nodes. I found this in dmesg :

INFO: task rgmanager:2901 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rgmanager     D 0000000000000000     0  2901   2900 0x00000080
 ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318
 ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20
 ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678
Call Trace:
 [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee59b>] mutex_lock+0x2b/0x50
 [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm]
 [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220
 [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm]
 [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
 [<ffffffff8120c646>] ? security_file_permission+0x16/0x20
 [<ffffffff81176918>] vfs_write+0xb8/0x1a0
 [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
 [<ffffffff81177321>] sys_write+0x51/0x90
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced the node, same outcome.

Any hints ?

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]