[Linux-cluster] Multiple "rgmanager" instances after re-booting from a kernel panic.

Tue Jan 28 14:28:21 UTC 2014

Hello.

We have a strange situation with rgmanager on a two node
(active/passive) cluster, on physical servers.
Nodes are N1 and N2.
Out investigation is to simulate how the cluster will react after a
kernel panic of the active node.
Kernel panic is "simulated" with echo b > /proc/sysrq-trigger
Fence agent used: fence_scsi
Common storage is: EMC (with powerPath installed on both nodes)

The scenarios are:
S1. Active node is N1 passive mode is N2
After kernel panic of N1, the N2 resumes the services previously run on
N1 (Expected behavior).
N1 re-boots and after a while re-joins the cluster. (Expected behavior)

S2. Now, active node is N2.
We perform a kernel panic on N2. N1 resumes (correctly) the services
previously run on N2.
After the reboot of N2, cman starts OK (with all other processes), as
well as clvmd.
But the rgmanager process seems to hang and in 'ps' it appears three
times (the normal is two).
The logs from N2 show:
rgmanager[4985]: I am node #2
rgmanager[4985]: Resource Group Manager Starting
rgmanager[4985]: Loading Service Data

ps -ef | grep rgmanager
root      4983     1  0 15:19 ?        00:00:00 rgmanager
root      4985  4983  0 15:19 ?        00:00:00 rgmanager
root      5118  4985  0 15:19 ?        00:00:00 rgmanager

Versions:
rgmanager-3.0.12.1-19.el6.x86_64
cman-3.0.12.1-59.el6.x86_64
corosync-1.4.1-17.el6.x86_64
fence-agents-3.1.5-35.el6.x86_64
clusterlib-3.0.12.1-59.el6.x86_64
lvm2-cluster-2.02.87-6.el6.x86_64

Any help appreciated.
Demetres.