[Linux-cluster] rgmanager is jamed

Nicolas Ross rossnick-lists at cybercat.ca
Fri May 25 16:20:43 UTC 2012


I am in the process of upgrading one of our cluster from RHEL 6.1 to 
6.2. It's an 8-node cluster.

I started with one node. Stop all cluster resources, cman, rgmanager et 
al. yum update, reboot, move to next. The first one did ok.

On the second one, rgmanager started, but doesn't seem to connect to 
other nodes. I found this in dmesg :

INFO: task rgmanager:2901 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rgmanager     D 0000000000000000     0  2901   2900 0x00000080
  ffff880667299d48 0000000000000082 0000000000000000 ffff8806656aa318
  ffff88066729c378 0000000000000001 ffff880665bb31b0 00007fffc6c6fa20
  ffff88066635a678 ffff880667299fd8 000000000000f4e8 ffff88066635a678
Call Trace:
  [<ffffffff814ee6fe>] __mutex_lock_slowpath+0x13e/0x180
  [<ffffffff814ee59b>] mutex_lock+0x2b/0x50
  [<ffffffffa02c192c>] dlm_new_lockspace+0x3c/0xa30 [dlm]
  [<ffffffff8115f74c>] ? __kmalloc+0x20c/0x220
  [<ffffffffa02ca94d>] device_write+0x30d/0x7d0 [dlm]
  [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
  [<ffffffff8120c646>] ? security_file_permission+0x16/0x20
  [<ffffffff81176918>] vfs_write+0xb8/0x1a0
  [<ffffffff810d4932>] ? audit_syscall_entry+0x272/0x2a0
  [<ffffffff81177321>] sys_write+0x51/0x90
  [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Tried rebooting, but the shutdown staled on stoping rgmanager. Fenced 
the node, same outcome.

Any hints ?




More information about the Linux-cluster mailing list