[Linux-cluster] Info on restart of non critical resources

Tue Nov 19 10:06:56 UTC 2013

Hello,
I have a cluster with RH EL 6.3
cman-3.0.12.1-32.el6_3.2.x86_64
rgmanager-3.0.12.1-12.el6.x86_64

I configure ssh in cluster changing the default init script.
Then I configure it as a non critical resource in a service section

<resources>
...
<script file="/etc/init.d/sshd" name="clusterssh"/>
</resources>
...
                <service autostart="0" domain="PABX" name="PABX">
                <resource 1 ...>
. . .
                <script __independent_subtree="2" ref="clusterssh"/>
                <resource
                </service>

If the pid of sshd process related to VIP is 2689

[root at myserver cluster]# kill 9 2689

[root at myserver cluster]# tail f /var/log/messages
Nov 15 16:30:22 myserver rgmanager[4694]: [script] Executing
/etc/init.d/sshd status
Nov   15   16:30:22   myserver   rgmanager[4722]:   [script]
script:clusterssh:   status   of
/etc/init.d/sshd failed (returned 1)
Nov   15   16:30:22   myserver   rgmanager[11542]:   status   on
script   "clusterssh"   returned   1
(generic error)
Nov   15   16:30:22   myserver   rgmanager[11542]:   Some
independent   resources   in   service:PABX
failed; Attempting inline recovery
Nov 15 16:30:22 myserver rgmanager[4753]: [script] Executing
/etc/init.d/sshd stop
Nov 15 16:30:22 myserver rgmanager[11542]: Inline recovery of
service:PABX complete
Nov 15 16:30:22 myserver rgmanager[11542]: Note: Some noncritical
resources were stopped
during recovery.
Nov 15 16:30:22 myserver rgmanager[11542]: Run 'clusvcadm -c
service:PABX' to restore them
to operation.

The ssh resource remains stopped and the service gets a [P] flag in
clustat output.

# clustat
Cluster Status for mycluster @ Fri Nov 15 16:30:54 2013
Member Status: Quorate
 Member Name                                                     ID   Status

 node1                                                    1 Online, rgmanager
 node2                                                    2 Online,
Local, rgmanager
 /dev/block/253:5                                                0
Online, Quorum Disk
 Service Name                                            Owner (Last)
    State

 service:PABX                                            node2
started    [P]

The suggested command
clusvcadm -c service:PABX
takes it online again:

Nov 15 16:31:22 myserver rgmanager[11542]: Repairing service:PABX
Nov 15 16:31:22 myserver rgmanager[6787]: [script] Executing /etc/init.d/sshd
start
Nov 15 16:31:22 myserver rgmanager[11542]: Repair of service:PABX was successful

Is this expected behaviour? Any way to configure to try to restart in
place the resource without manual intervention when a resource is
configured as non critical?

Thanks in advance,
Gianluca